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economy in duplication. (b) Tables and fig- 
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Despite the fact that most writers on per- 
sonality theory and psychopathology discuss 
daydreaming, the highly personal and ephem- 
eral. quality of conscious fantasy has posed 
baffling problems to the investigator who seeks 
to formulate operational tests of the various 
theoretical notions in the field. The investiga- 
tion to be described here represents one phase 
of a general program of research designed to 
explore the functional role of daydreaming or 
fantasy behavior in the organization of per- 
sonality. Since much of the theory and em- 
pirical knowledge concerning daydreaming de- 
rives from the observations of individual cli- 
nicians with relatively limited subject samples, 
and under highly specialized conditions (psy- 
choanalysis or examination of psychiatric pa- 
tients), it was felt desirable to approach the 
problem from a somewhat different point of 
view. For one thing, relatively little is known 
as yet concerning the actual range and vari- 
ability of daydreaming tendencies in the nor- 
mal population. There is, furthermore, little 
systematic knowledge of the relationship of 
daydreaming tendencies to other personality 
characteristics or to certain crucial dimen- 
sions of behavioral variations in presumably 
normal individuals. While a variety of studies 
with thematic apperception type of material 
have provided useful techniques for scrutiniz- 
ing patterns of such fantasy needs as achieve- 
ment and aggression, this research has noi 
primarily been concerned with the more gen- 
eral role of a capacity for daydreaming. 

A ‘synthesis of theoretical formulations and 
some empirical observations by writers such 


1 This study was supported under Public Health 
Service NIMH Grant M-2279. The authors are in- 
debted to Vivian McCraven, Judith Antrobus, and 
John Antrobus, who assisted by acting as raters and 
in various scoring and computational procedures. 
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Freud, Sullivan, Mead, and Lewin have 
suggested a view of daydreaming that has 
served as a basis for some of the tentative 
hypotheses of the study. The capacity to en- 
gage in daydreaming is, to some extent, a 
learned response which develops differentially 
as a function of certain patterns of parent- 
child relationships. Of particular significance 
in its development appears to be the oppor- 
tunity for identification with a benign pa- 
rental figure under circumstances in which 
intermittent reinforcement for the child’s con- 
trol of overt gratification seeking movements 
occurs. To some extent, mothers in our so- 
ciety tend to represent inhibition of impulses 
and also to foster aesthetic interest, while fa- 
thers represent action tendencies and the ex- 
ternal environment. Closer identification with 
a mother figure would therefore appear par- 
ticularly to be related to introspective tend- 
encies. 

The mode of translation of checked body 
movement into a capacity for instituting 
movement on an imaginal level is difficult to 
explain; Werner’s Sensory-Tonic Theory pro- 
vides the most specific approach to the prob- 
lem. With reinforcement both by parental fig- 
ures and by the general socioeconomic condi- 
tions or sociocultural milieu, fantasy or resort 
to verbal or imaginal means of dealing with 
delays becomes an increasingly differentiated 
ability which provides additional benefits, 
since it frees a person from dependence on 
the immediate perceptual situation and af- 
fords a fluid medium in which trial actions 
can occur with impunity. In adults, under 
optimal conditions, a differentiated capacity 
to engage in daydreaming may make it pos- 
sible for the individual to increase his aware- 
ness of self-other relationships, of his own 
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action tendencies seen in time perspective, 
and it may enhance the possibility of a po- 
tentially greater repertory of role relation- 
ships through imaginal practice. Pathological 
extremes in this personality dimension may 
involve either excessive resort to fantasy with 
consequent paralysis of fruitful motor ex- 
ploration of the environment or failure to de- 
velop fantasy tendencies (as apparently has 
occurred in certain institution reared chil- 
dren), with consequent inability to delay mo- 
tor responses and much self-defeating or de- 
structive motility. The question as to the op- 
timal degree or type of daydreaming remains 
as yet an unexplored area from the stand- 
point of empirical research. 

To the extent that Rorschach human move- 
ment (/) responses may represent tendencies 
to engage in daydreaming (Singer, 1955), 
support for the notion that daydreaming 
tendencies are associated with motor inhibi- 
tion, planfulness, and parental identification 
has come in a number of studies (King, 
1958; Shatin, 1953; Singer & Sugarman, 
1955; Singer, Wilensky, & McCraven, 1956). 
The present investigation represents an effort 
to move beyond the inferences concerning 


Rorschach M responses to a more direct study 
of daydreaming and fantasy tendencies. In 
a somewhat similar effort, Page (1957) re- 
cently reported a relationship between a ques- 
tionnaire derived daydream score and M. 


HYPOTHESES 


The general hypothesis of this study is that 
subjects (Ss) who indicate a greater fre- 
quency of daydream behavior are also char- 
acterized by greater reported frequency of 
night dreams, social introversion, and crea- 
tivity in their spontaneous reports of day- 
dreams or storytelling activity. They are, in 
addition, more likely to be identified with 
their mothers (on the basis of measures of 
assumed similarity of interests); those who 
report less daydreaming, on the other hand, 
are expected to show greater evidence of re- 
pression or denial of problems and a lesser 
tendency toward identification with their 
mothers. The inclusion of a form of manifest 
anxiety scale (Welsh’s A scale) in the bat- 
tery of procedures was carried out with the 
interest of exploring the possibility of an em- 


pirical linkage between daydreaming and anx- 
iety. While it was felt that clinical evidence 
suggests, generally speaking, a dampening of 
imaginative behavior during attacks of free- 
floating anxiety, it was considered likely that 
the type of behavior reported on the A scale 
might to some extent represent willingness to 
adopt a self-scrutinizing attitude or to admit 
complaints, rather than serving as an indi- 
cator of gross differences in anxiety. 

In effect, then, the conception that is ex- 
amined empirically in this paper is that one 
of the dimensions along which people vary in- 
volves the tendency or capacity to see them- 
selves in a temporal or spatial perspective and 
to engage in some form of imaginal living. 
Operationally, such a tendency or personality 
style is manifested by relative willingness to 
respond to questionnaire materials of a per- 
sonal sort, ability to admit a variety of in- 
ternal ideational activities, and greater will- 
ingness or ability to provide creative thematic 
material to ambiguously structured stimuli. 


SUBJECTS AND PROCEDURE 


For the preliminary investigation described here, 2 
group of 44 women, graduate students in education 
served as Ss. The Ss consisted of Negro and white 
women, married and most of whom were 
teachers. The major group breakdown employed for 
this study was on the basis of the median score on 
a questionnaire of daydreaming frequency. No sig- 
nificant differences emerged between the High and 
Low Daydream groups in age, years of education, 
marital status, white-Negro ethnic groups, or socio- 
economic background. 

The following procedures were employed to obtain 
relevant measures from Ss along the hypothesized 
dimensions: 

Daydream Questionnaire. A detailed inventory con- 
cerning the patterns of daydreaming and the fre- 
quency of occurrence of specific daydreams was de- 
veloped.2 The phase of the inventory employed in 
the present investigation consisted of a series of 93 
specific daydreams. Ss were required to indicate on 
a five-point scale from Very Frequently to Practically 
Never the relative frequency with which they ex- 
perienced each daydream. A total score was derived 
for each S based on her 


ingle, 


self-weighted responses to 


2A copy of the questionnaire has been deposited 
with the American Documentation Institute. Order 
Document No. 6466 from ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress; Washington 25, D. C., remitting in advance 
$2.00 for microfilm or $3.75 for photocopies. Make 
checks payable to: Chief, Photoduplication Service, 
Library of Congress. 
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each item. This Daydream score (ranging theoreti- 
cally from 93 to 558) served as the basis for divid- 
ing Ss into High and Low Daydream groups. A cut 
at the median score (173) was employed. It should 
be noted that the internal consistency of the Day- 
dream score was quite high, with Cronbach’s alpha 
yielding a coefficient of .96 for a group of 240 Ss. 

Frequency of Night Dreams. Each S kept a log of 
her night dreams over a period of 1 month. The 
score employed here was Dream Frequency, the num- 
ber of separate nights during this period that S re- 
membered at least one reportable dream. 

Welsh’s Repression Scale—(R). On the basis of an 
extensive reanalysis of Minnesota Multiphasic Per- 
sonality Inventory responses, as well as considerable 
subsequent study, Welsh (1956) developed a scoring 
scale for MMPI which he terms the R scale. Items 
on this scale seem best characterized as reflecting for 
high scorers tendencies toward denial or repression, 
and for low scorers externalized or acting-out be- 
havior. Most scorable items are answered false for 
this scale, but Welsh’s evidence argues against a sim- 
ple response set. 

Welsh’s Anxiety Scale—(A). The A scale, derived 
similarly by Welsh, consists of items from the MMPI 
in which “disability of a dysthymic and dysphoric 
nature” with anxiety is most prominent. According 
to Welsh’s further study of profiles from diagnostic 
groups, anxiety states fall high on A, but for Ss who 
score high on both A and R, depression is a primary 
symptom; those Ss who score high on A and low on 
R reflect schizoid features 

MMPI Lie Scale. The 15 Lie items from the MMPI 
were included as an additional measure of denial 
tendencies and to provide some indication of the ex- 
tent to which the responses to the daydream ques- 
tionnaire might be subject to conscious falsification 

Social Introversion—(Si). A scale for social intro- 
version was derived from the MMPI by Drake 

1956). Correlations with another measure of intro- 
version were in the .70’s for both men and women 
college students; in addition, the mean for those stu- 
dents engaging in more college activities than the 
average student showed significantly less introversion 
than the mean of those participating less than the 
average amount. 

Parental Identification Patterns. As one approach 
to the issue of similarity to parents, a questionnaire 
and procedure derived from a study by Oliner (1958) 
was employed. This questionnaire consists of 44 
items dealing with a variety of interests and ac- 
tivity patterns to which Ss indicate their reactions 
on a four-point scale from “very much like” to “very 
much dislike.” These items were responded to ini- 
tially by each S for herself, after which instructions 
called for responding to the questionnaire as “Person 
I would like to be,” and then as the items applied 
to “Mother” and to “Father.” To evaluate the rela- 
tive perceived similarity of self to mother as against 
father, a score based on the formula (Self-Father)- 
(Self-Mother) was derived. A high score on this 
variable indicates that S reported the difference be- 
tween her own interests and those of her father to 


be greater than the difference between her own in- 
terests and those of her mother. A positive correla- 
tion was therefore hypothesized between the (S-F) 
(S-M) score and degree of fantasy, such that the 
greater the perceived similarity to mother rather 
than to father, the greater the fantasy tendency. A 
score based on the absolute difference between per- 
ceived interests of fathers and mothers (F-M) was 
also employed. 

Creativity of Spontaneous Daydream and Story- 
telling. At the conclusion of the questionnaire, Ss 
were asked to write an account of an actual day- 
dream and also to make up a spontaneous original 
story. The daydream and original story were then 
scored for Creativity, ic., a measure of the intro 
duction of novel materials, characters, time and space 
sequences, and emotional vividness. Using a definition 
of creativity in terms of the above criteria, two ex- 
aminers independently scored all protocols along a 
five-point scale for Creativity. Rater reliability for 
a larger sample of 240 Ss had been evaluated for 
this variable and was felt to be satisfactory, since in 
only 11 out of 240 ratings were there differences as 
great as two points on the five-point scale, and no 
difference as great as three points. The average of 
the two raters’ scores was employed for the final 
Creativity score. 

Needs Achievement, Self-Aggrandizement, and Af- 
filiation. In addition to scoring the structural charac- 
teristics of the story and daydream, some attempt 
was made to consider the specific thematic content 
of these materials. Three fantasy needs emerged with 
enough variability in most of the records to permit 
a quantitative rating. These were Need Achievement, 
scored essentially along the lines laid out by Mc- 
Clelland (1958) and Atkinson (1958), Need Self-Ag 
grandizement (employed here as representing obtain- 
ing material possessions or display items, as well as 
high social status without particular effort or achieve- 
ment), and Need Affiliation (employed here to in- 
clude gregariousness, need for social warmth, and 
sex). It was thought that Need Achievement in par- 
ticular would relate to degree of daydream activity 

The need scores were rated independently along a 
five-point scale and raters’ results were averaged to 
give a final score for each S on each need. While 
these scores could not be considered experimentally 
independent of the Creativity score, the intercorrela- 
tion data below suggest that they cannot be consid- 
ered merely as reflections of the Creativity score. 

Vocabulary Score. To obtain a brief estimate of 
verbal intelligence, the multiple-choice vocabulary 
test from the IER Intelligence Scale CAVD was in- 
cluded. This test correlates .50 with a general in- 
telligence factor for a sample of adult males (Thorn- 
dike, Norris, & Morrill, 1952), and it provides a 
simply scored indicator of gross intellectual differ- 
ences. No specific hypotheses concerning the role of 
intelligence were formulated for this study, but the 
Vocabulary score was employed to evaluate the like- 
lihood that particular correlations which emerged 
might merely represent intelligence differences 
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RESULTS AND DISCUSSION 


Following dichotomization of the distribu- 
tion scores for each of the above variables at 
their medians, tetrachoric r’s were calculated. 
The matrix of intercorrelations is presented in 
Table 1. 

Inspection of Table 1 reveals general sup- 
port for the hypotheses in the sense that sig- 
nificant correlations in the predicted direction 
emerged between the Daydream scores and 
Dream Recall Frequency, Perceived Similar- 
ity to Father minus Perceived Similarity to 
Mother, Creativity of Spontaneous Daydream 
and Storytelling material, and Need Achieve- 
ment. Significant positive correlations also 
emerged between Daydream score and the 
A scale, Need Self-Aggrandizement, Need Af- 
filiation and Father—Mother discrepancy. The 
Repression and Lie scales correlate negatively 
(at insignificant levels) with Daydream score, 
while Social Introversion correlates positively 
as predicted, but at a nonsignificant level. 
A simple graphic cluster analysis following 
Tryon (1939) reveals a fairly clear-cut pat- 
terning of the variables in this study. Day- 
dreams, Dreams, Social Introversion, Creativ- 
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ity, Need Achievement, (Father—Mother), and 
(Self-Father)—(Self-Mother), and the Anx- 
iety scale show a distinct cluster roughly 
paralleling each other in the extent of inter- 
correlations in the matrix. The Repression 
and Lie scales appear to form the negative 
pole of what appears as a bipolar cluster. 
Only the Anxiety scale was correlated sig- 
nificantly with Vocabulary; Need Affiliation 
tends to follow the major cluster with some 
variations, and Need Self-Aggrandizement 
does so to a much lesser extent. 

Although paralleling the Daydream scale 
through most of the matrix, the A scale re- 
veals a unique pattern correlating negatively 
with Need Self-Aggrandizement and Need 
Affiliation and Vocabulary. Thus, Ss who re- 
port many problems tend to show fewer fan- 
tasy themes dealing with possession and status 
attainment or need for interpersonal contact. 
Social Introversion shows a somewhat simi- 
lar pattern to anxiety, with a particularly 
high positive correlation emerging with Need 
Achievement, while a moderate negative cor- 
relation is revealed with Need Self-Aggran- 
dizement. 


TABLE 1 


INTERCORRELATIONS BETWEEN DAYDREAM SCORE AND OTHER VARIABLES 


. Daydreams 
Dreams 

. Repression 

. Anxiety scale 

. Lie scale 

. Social 
Introversion 

. Father-Mother 
(Self-Father 
(Self-Mother 
Creativity 
Need 
Achievement 

. Need Selif 
Aggrandizement 

. Need 
Affiliation 


3. Vocabulary 
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Easily the dominant variable in the cluster 
based on size of intercorrelations is Need 
Achievement. One might infer from this re- 
sult that much of the achievement need trans- 
lated into responsiveness to the test situation 
could account for High Daydream and Night 
Dream scores, as well as the Creativity and 
Anxiety scores. This causal type of explana- 
tion founders when the high correlation be- 
tween Achievement and (S-F)-—(S-M) is con- 
sidered, since the latter variable does not lend 
itself to bias resulting from an achievement 
or acquiescence set. The concomitant varia- 
tions of the cluster in question seem account- 
able on a more complex or subtle basis, there- 
fore. 

As a further check on this point, an analy- 
sis was made of identification choices of the 
Ss. As part of the questionnaire, the women 
in this group were asked to list movie or stage 
personages, historical figures, and characters 
from literature whom they emulated or wanted 
to be like. Analysis of these choices for the 
High and Low Daydream score groups re- 
vealed a significant difference in the “mascu- 
linity” or “femininity” of identification figures. 
The Low Daydream Ss chose significantly 
more male figures or women engaged in 
largely masculine pursuits or characterized 
by traits thought of as predominantly mascu- 
line (e.g., Joan of Arc, Elizabeth I of Eng- 
land, Amelia Earhart). The greater “feminine 
role” or maternal identification of the High 
Daydream Ss, as well as their high Creativity 
scores in thematic material and the high 
Need Achievement scores, suggests support for 
the suggested relationship between acceptance 
of inner life or long-range aspirations on the 
basis of maternal identification. 

Evidence supporting the hypothesis relating 
maternal identification and daydreaming has 
also emerged for a group of male Ss of com- 
parable background. This male sample did 
not undergo the same experimental procedures 
except for the Daydream questionnaire and 
the self-ratings and will not be reported at 
length here. Daydream frequency was posi- 
tively correlated with Self-Father discrepancy 
(r = .30, N = 64) and negatively correlated 
with Self-Mother discrepancy (r = — .19, N 
= 68). The results suggest that even for these 
men, the tendency to perceive oneself as simi- 


lar to one’s mother and unlike one’s father is 
associated to some extent with reported day- 
dreaming frequency. These results suggest the 
fruitfulness of exploring daydreaming tend- 
encies and self-awareness variables in terms 
of their linkages to family constellations and 
patterns of learning within the family situa- 
tion or cultural milieu. Only in this way can 
we hope to move beyond mere classification 
toward more theoretically derived statements 
concerning the relationships of a dimension 
such as “acceptance of inner life’ or “self- 
awaretiess” to the general framework of per- 
sonality development. 

In conclusion, it appears from these data 
that there is a general clustering of the vari- 
ables in a manner suggesting that these women 
differ dimension which might be 
termed self-awareness. We can, of course, 
never be sure that High Daydream Ss actu- 
ally do produce more daydreams and have 
more conscious achievement-aspirations than 
Low Daydream Ss. Operationally, we observe 
only that they accept these phenomena as 
part of their life-space and report them more 
readily. It appears likely, however, that the 
difference in attention and admission to others 
of these inner experiences may be the psycho- 
logically significant phenomenon and _ that 
quantitative differences in extent of inner liv- 
ing may be scientifically indeterminable. 
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SUMMARY 


The investigation described here represents 
one approach to a study of the functional 
role of daydreaming as a dimension of be- 
havior. It was hypothesized on the basis of 
various theoretical formulations that Ss who 
report a high frequency of daydreaming be- 
havior also indicate greater frequency of re- 
call of night dreams, creativity in storytelling, 
Need Achievement, and, possibly, willingness 
to admit anxieties or complaints. These High 
Daydream Ss were expected also to demon- 
strate greater assumed similarity to their 
mothers than to their fathers and less evi- 
dence of repression or denial (MMPI Lie 
scale) than low frequency daydreamers. A 
group of 44 adult female graduate students 
responded to a variety of questionnaire ma- 
terials and also reported frequencies of day- 
dreams and night dreams. The test materials 
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included MMPI Anxiety (A) scale, Repres- 
sion scale, Lie scale, and Social Introversion 
(Si), as well as a series of interest items to 
be filled out by each S for herself, ideal self, 
and as her mother and her father would have 
done. Spontaneous storytelling and daydream 
material were also elicited and scored for 
Creativity, Need Achievement, Need Affilia- 
tion, and Need Self-Aggrandizement. The re- 
sults supported the general hypothesis, indi- 
cating that daydream frequency, night dream 
recall frequency, thematic creativity, Need 
Achievement, anxiety, and relatively greater 
identification with mother than with father 
intercorrelated positively, while Repression 
and Lie scales both correlated negatively with 
the other variables in the cluster. The data 
suggest that High and Low Daydreamers 
differ along a dimension which might be 
termed self-awareness, or acceptance of inner 
experience. 
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In an experimental study, Williams (1947) 
found a highly significant relationship between 
Rorschach indices of intellectual control and 
performance under stress on the Wechsler- 
Bellevue Digit-Symbol test. His results sup- 
ported the validity of the Rorschach con- 
structs, indicated that the Rorschach test is 
a practical instrument for predicting behavior 
under stress, and implied that efficiency of 
performance under conditions of stress is 
mainly a function of intellectual control. This 
was pointed out by Carlson and Lazarus 
(1953), who questioned the representative- 
ness of Williams’ findings because of his ex- 
perimental procedures and the conflicting re- 
sults reported in related studies. They re- 
peated Williams’ experiment and did not 
obtain comparable correlations. These dis- 
crepant findings are similar to those contained 
in a review article on stress by Lazarus, 
Deese, and Osler (1952). 

The present research represents a modifica- 
tion of Williams’ experimental procedures in 
two important respects: in that a real-life 
stress situation was produced, and the sub- 
jects (Ss) had a common motivation to suc- 
ceed. Measures of self-acceptance were also 
obtained from the Ss. This personality vari- 
able was studied because it is a basic com- 
ponent of psychological theory which indi- 
cates that effectiveness of behavior is directly 
related to self-acceptance (Rogers, 1951; 
Snygg & Combs, 1949; Symonds, 1951). Of 
further interest is the congruity between the 
characteristics of a self-accepting person and 
the concept of a mature individual having 


1 This paper is based upon a doctoral dissertation 
completed at the University of Pittsburgh in 1954. 
The writer is indebted to members of his thesis com- 
mittee, J. Matthews, A. W. Bendig, H. W. Goodman, 
and A. D. Lazovik, for their guidance and encourage- 
ment. 


adequate intellectual control, as depicted by 
Beck (1945). The prediction was made that 
acceptance of self would be highly correlated 
with the Rorschach indices of intellectual 
control. 


METHOD 
Subjects 


All the pledges (N = 30) of a campus fraternity 
volunteered to participate in this study, which was 
described as being done to investigate certain impor- 
tant questions facing clinical psychologists. These 
pledges had been selected by the fraternity from a 
large number of applicants; they were all highly 
motivated to become active members. Achieving ac- 
tive status was dependent upon the over-all impres- 
sion made by each pledge on the fraternity members 
This factor was especially crucial during the time 
this study was being done, since the Ss were in the 
trial stage of their pledge period. Consequently, an 
important motivation common to this group of Ss 
was to make a favorable impression on the fra- 
ternity members. 


Procedure 


In the first phase of the study the Rorschach test 
was administered individually to each of the pledges, 
based on Beck’s (1944) procedures. This took ap- 
proximately 1 month. 

The Ss then met as a group in a classroom to com- 
plete five practice trials on the Wechsler-Bellevue 
Digit-Symbol test (Wechsler, 1944). During the ad- 
ministration of the sample form of the Digit-Symbol 
test the experimenter (Z), in giving instructions to 
the group, referred to the identical sample form 
which had been reproduced on the blackboard. Each 
of five trials of the Digit-Symbol test was completed 
within 90 seconds, with a 1-minute rest period be- 
tween each trial. The number of items on the Digit- 
Symbol test had been increased so that the complete 
test would not be finished in the allotted time. The 
Ss then completed the Berger Scale of Expressed Ac- 
ceptance of Self (Berger, 1952) which yielded the 
Control Level scores. The Berger scale is self-ad- 
ministering and is composed of 36 items. Selection 
of these items was made according to the definition 
of a self-accepting person derived in an earlier study 
by Sheerer (1949). The respondent rates each item 
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on a scale from 1 to 5 depending upon how true he 
feels the item to be in describing himself. To be valid, 
use of the Berger scale requires that the Ss be un- 
identified. As it was necessary for the purpose of this 
study to identify the pledges in order to compare 
their responses under control and stress conditions, 
the Berger scale was covertly coded. 

The second group session took place the day after 
the first meeting. Twelve volunteers from the group 
of assembled pledges were taken to another room, 
where they completed six additional trials of the 
Digit-Symbol test. A 3-minute rest period was taken 
at the end of the third trial, instead of the standard 
i-minute interval. The results of the last three trials 
of this series yielded the Digit-Symbol Control Level 
scores. 

For the final phase of the experiment, the 12 Ss 
were taken to the psychology department’s labora- 
tory in another part of the building to take three 
more trials of the Digit-Symbol test (Stress Level 
scores). This procedure of having thé Ss. complete 
the Digit-Symbol test under control and stress con- 
ditions yielded the behavioral criterion (decrement 
in performance on the Digit-Symbol test) patterned 
after Williams’ study. In addition, the current cri- 
terion measure was designed to incorporate and em- 
phasize psychological stress variables which would be 
stressful to a pledge to the degree that he was lack- 
ing in the characteristics of a self-accepting person 
as defined by Berger (1952). This was accomplished 
by exposing the Ss to psychological stress which com- 
prised externally applied pressures as a standard for 
behavior, maximized their need to deny or distort 
unacceptable personal characteristics, and indicated 
that public comparisons would be made of their in- 
dividual performance results. 

In the laboratory, two identical continuous panels 
had been constructed and placed back-to-back 5 
inches apart in the center of a long table, with six 
positions on each side. Each section of the panel fac- 
ing the S included a white light and a red light. Ex- 
tending perpendicularly from the main panel on each 
side of a given position was a smaller panel. All the 
panels were 1 foot high. Consequently, when the Ss 
were completing the experimental tasks, they could 
not see the progress made by any of the adjacent 
pledges. All the lights in the laboratory were on; 
also, two No. 1 photoflood lamps mounted on tri- 
pods were placed on the table, one at each end, and 
beamed at the pledges without causing a direct glare. 
At one end of the long table in full view of the Ss 
was the shocking apparatus. This consisted of an in- 
clined panel board which had separate switches for 
the lights and the electric shock, and was the ter- 
minal point for the maze of wires which led from 
the apparatus to the individual positions. 

After the Ss were seated, electrodes were attached 
to the nonwriting hand of each pledge by E. He 
then stood at the end of the table near the shocking 
apparatus, where he could easily be seen by all the 
pledges. The following instructions were given, pat- 
terned after those of Williams: 


You are now being observed by a number of 


psychologists who are taking notes and continuous 
photographs of all your reactions throughout the 
remainder of this experiment. [At one end of the 
table stood a graduate student who operated a 
portable motion picture camera. Immediately be- 
hind each group of three Ss stood a graduate stu- 
dent who served as a judge; the four judges in- 
cluded one female. There was no communication 
between them while the Ss completed the tests 
During the experiment and the rest intervals, they 
took sham notes of the Ss’ behavior. At the end of 
each trial the judges shifted position, which served 
to randomize the effect of any particular judge on 
the S.] All directions are to be followed implicitly 
Rest your arm attached to the electrodes on the 
table and keep it there from now on. You will 
notice that the electrodes on your arm are now 
connected to the panel before you. [White light 
turned on and left on.] The white light that has 
just gone on indicates that our shock apparatus 
has been turned on. You are connected to this ap- 
paratus. During the following period you may re- 
ceive a strong electric shock whenever the ob- 
servers feel that your test performance is not up 
to our standards. Whenever the red light goes on, 
you are not meeting our standards and you are in 
danger of being shocked, like this. [Red light 
turned on individually for each pledge, and fol- 
lowed by an electric shock. Red light turned off.] 
Based upon the psychologists’ evaluation of your 
reactions and your tests, each of you will be com- 
pared to all the rest of the pledges. You will be 
compared for personality factors and intelligence 
These lists will be posted in your fraternity house 
2 weeks from now, so that all of you can see how 
you compare with the rest of the group. [These 
results posed a realistic threat to the Ss, who were 
all highly motivated to make a favorable impres 
sion on the fraternity members in order to achieve 
active status in the fraternity.] Now pick up your 
pencil and write your number, and 
group number in the upper right-hand corner. You 
will see that this is the same test form you just 
took downstairs. Your instructions are the same as 
before. At the signal, “Go,” turn the sheet over 
and work as fast as you can until you are told to 
stop 


name seat 


Concentrate on your work. Remember, you 
are being observed and continually photographed 
Your work will be compared with the rest of the 
pledges, and you will be shocked whenever your 
work falls below our standards. Get set for Trial i 
Go! 

Three seconds after the Go signal, the red light 
was turned on. After a 5-second interval, the electric 
shock was administered; immediately afterward, the 
red light was turned off. (The electric shocks were 
delivered through the electrodes by an electronic in- 
terval timer which activated a regulated shock unit 
Each pledge could be individually shocked, since the 
shocking apparatus was connected to a 12-pole posi- 
tion switch. After the shock had been on for 0.4 
second, it was automatically turned off. Simultane- 
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ously, the The electric shock 
was then given to the S whose position number next 
appeared on E’s list, which had been previously de 
rived by chance selection of the panel position num 
bers Ninety seconds after Trial 1, the 
pledges were told to “Stop.” During the 1-minut 
interval before instructed to start again, the 
pledges were again informed of the use to be made 
of their results. The same procedure 
was repeated in Trials 2 

After Trial 3 of the test was con 
cluded, the electrodes were removed from each S’s 
hand. At that time, to ascertain the sensitivity of 
the Berger scale to situational influences, the test 
was taken by the Ss under stress conditions (Stress 
Level scores). They were given instructions similar 
used during the Digit-Symbol testing. As 
the Ss left after finishing the Berger scale, they were 
cautioned not to return to the room where the other 
pledges were gathered. The identical experimental 
procedures were repeated with the remaining pledges, 
the second group including 12 Ss, and the third 
group consisting of 6 Ss 

Shortly after completion of the study, E met with 
the pledges and explained to them the purposes of 
the research. They were assured of the confidentiality 
of the data, and that their performance in the study 
would have no bearing on their status in the 
ternity. 


red light was turned off 


starting 
being 


performance 
and 
Digit-Symbol 


to those 


fra- 


RESULTS 


The Rorschach records were individually 
scored according to Beck (1944) and the fol- 
lowing three measures of intellectual control 
were derived: F+% for the total record, 
F+% for the color cards alone, and Sum C/ 
Total C. A high F+% is held to indicate a 


TABLE 1 


SUMMARY OF RORSCHACH PERFORMANCE 


Experimental group 


Rorschach Category Mean Range SD 


Sum C/Total C 
Goldfarb 


Carlson & Lazarus 
Williams 


F+% Total 
Goldfarb 


Carlson & Lazarus 
Williams 


52-100 
50-100 
70-100 


F+% Color Cards 
Goldfarb 


Carlson & Lazarus 
Williams 


0-100 
0-100 
50-100 


under Stres 


TABLE 2 


Dicit-SyMBol 
PERFORMANCI 


IMMARY OI 
TEsT 


Experimental Group 


Digit-Symbol Measure Mean SD 


1. Control Level 


Goldfarb 
Carlson & Lazarus 


17.16 
16.69 
2. Stress Level 

Goldfarb 


Carlson & Lazarus 


1 minus 2 


Stress Decrement 
Goldfarb 
Carlson & Lazarus 
Williams 

i test 
Goldfarb 
Carlson & Lazarus 


* Significant at .01 level 


high degree of intellectual control, while the 
converse relationship is stated for the Sum C 
Total C measure (Beck, 1944). 

Table 1 shows that the Ss’ scores for the 
specified Rorschach measures are very con- 
sistent with those reported by Williams and 
by Carlson and Lazarus. This signifies that 
similar groups of Ss were used in all three 
studies. 

The group of pledges reached a plateau of 
no further improvement on the Digit-Symbol 
test by Trial 11. This finding corresponded 
with the results found by Williams, which 
was not the case in the Carlson and Lazarus 
study. The Stress Decrement scores in the 
current study very likely represented a decre- 
ment from a level of maximum performance 
for the pledges. The measure of Stress Decre- 
ment was computed by determining the mean 
number of digits correctly completed for 
Trials 9, 10, and 11 (Control Level) minus 
the mean number correctly completed for 
Trials 12, 13, and 14 (Stress Level). In this 
study, and in the other two, the magnitude of 
the Stress Decrement measures was indicative 
that all Es produced comparable stressful con- 
ditions by their procedures. These data are 
contained in Table 2. 
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TABLE 3 
INTERCORRELATIONS AMONG DicGIT- 
SymMBoL Test MEASURES 


Corrected 
Stress Stress 
1 Decre Decre 


Contr 
Level ment ent 


Digit-Symbol measure 
Stress Decrement 

Goldfarb 

Carlson & Lazarus 

Williams 


Corrected Stress Decrement 


Goldfarb 
Carlson & Lazarus 


Improvement under Stress* 


Goldfarb 
Carlson & Lazarus 


Stress Maximum! 
Goldfarb 
Carlson & Lazarus 
Williams 

Stress Level 


Goldfarb 86 
Carlson & Lazarus 87 


* Last stress trial minus first stress trial. 
> Highest control trial minus lowest stress trial 


Perhaps a more valid criterion measure of 
experimental stress for the Ss’ performance 
on the Digit-Symbol test is the Stress Maxi- 
mum score found by subtracting the first 
stress trial (Trial 12) from the last control 
trial (Trial 11). It is suggested that Trials 
12, 13, and 14 comprise not only decrement 
in performance under stress, but also include 
the factor of recovery from stress. The fol- 
lowing analysis supports this hypothesis. The 
difference between the successively larger 
mean scores made on Trials 12 and 13 was 
found to be statistically significant beyond 
the .01 level (¢ = 7.56). However, the mean 
difference in scores made on Trials 13 and 
14 was found to be insignificant (¢ = .41). 
A possible interpretation of these findings is 
that Trial 12 represented the primary reac- 
tion to the stress situation, while Trials 13 
and 14 also reflected the Ss’ efforts to recover 
from and stabilize their reactions to the stress 
condition. If one were to use the Stress Maxi- 
mum score as a measure of experimental 
stress, then the average decrement for this 
sample was a drop of 15 points. This ex- 
tremely significant decrement in the Ss’ per- 
formance on the Digit-Symbol test very likely 
indicates the period of the greatest influence 
of the stress factors. Since the Stress Maxi- 
mum scores correlated very highly with the 


Stress Decrement .93),? as was 
also found. with the Williams and the Carlson 
and Lazarus studies, they were not correlated 
with the other measures. 

The intercorrelations among the perform- 
ance test measures are presented in Table 3. 
A correlation of .36 is required for significance 
at the .05 level. In this research, as in the one 
by Carlson and Lazarus, the degree of Stress 
Decrement on the Digit-Symbol test is corre- 
lated with the Control Level. To eliminate the 
influence of the Control Level of performance, 
a statistical procedure patterned after that of 
Carlson and Lazarus (1953, p. 250) was used 
to derive the Corrected Stress Decrement 
scores. Although these results were signifi- 
cantly correlated at the .05 level with the 
measure of Improvement under Stress, the 
latter scores were also correlated with the 
various personality indices in order to com- 
pare them with the Carlson and Lazarus find- 
ings. The Improvement under Stress scores 
were obtained by subtracting the score for 
Trial 12 (first trial under stress) from the 
score for Trial 14 (last trial under stress). 

Table 4 indicates that in this study no sig- 
nificant correlations were found between the 
Rorschach indices of intellectual control and 
performance under stress, as measured on the 
Digit-Symbol test. These findings do not sup- 
port the hypotheses that the Rorschach test 


scores (? = 


2 All coefficients of correlation reported in this 
study were derived by the Pearson product-moment 


method. 
TABLE 4 


CORRELATIONS BETWEEN Dicit-Symsot Test 
MEASURES AND RorsCcHACH MEASURES 


Sum C F+% 


Color 


Cards 


F+Y% 
Digit-Symbol Measure Total C Total 


Stress Decrement 


Goldfarb 
Carlson & Lazarus 
Williams 

Corrected Stress Decrement 


Goldfarb 
Carlson & Lazarus 


Improvement under Stress 


Goldfarb 
Carlson & Lazarus 
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TABLE 5 


SUMMARY OF BERGER SCALE PERFORMANCE 


Experimental 
Group 


Mean SD 


Berger Scale Measure 


1. Stress Level 


2 


145.03 


137.87 


14.07 
Control Level 17.29 
135.50)* 22.36 


Stress Level minus Control Level 
t test 


7.16 
4.09* 


9.09 


* Data in parentheses obtained by 
students. 

b+ Measure of Increase in Self-Acceptance (Stre 

* Significant at .01 level 


Berger with day 


can be used to predict behavior under stress, 
and that reactions to stress are mainly a func- 
tion of the personality variable of intellectual 
control. Furthermore, the validity of these 
Rorschach constructs is not confirmed. The 
results of the current study are in marked 
contrast to those found by Williams, but are 
consistent with the correlations obtained by 
Carlson and Lazarus. 

The mean difference in the Ss’ scores on the 
Berger scale obtained under control and stress 
conditions was statistically significant at the 
.01 level. This finding supports the validity 
of the psychological stress experienced by the 
Ss; indicates that the Berger scale provides 
a sensitive measure of self-acceptance; and 
reveals that the pledges as a group presented 
themselves as being more self-accepting, i-e., 
more mature and independent, when they 
learned that their responses to the Berger 
scale would be made public. Comparison of 
the present results derived under control con- 
ditions with the findings obtained by Berger 
with a group of day college students, a 
sample comparable to the pledges used in 
this study, indicated consistency of results. 
Table 5 lists these results. 

Analysis of the intercorrelations among the 
Berger scale measures indicated that Increase 
in Self-Acceptance (Stress) was significantly 
correlated at the .05 level with the Control 
Level (r = .37). To eliminate the influence of 
the Control Level of performance, Carlson 
and Lazarus’ (1953, p. 250) statistical pro- 
cedures were followed to derive the measure 
designated the Corrected Increase in Self-Ac- 


ceptance (Stress). The latter measure corre- 
lated —.05 with the Control Level scores, 
and .79 with the Increase in Self-Acceptance 
(Stress) scores. The correlation between the 
Control Level scores with the Stress Level 
scores was .84. 

The correlations between the Berger scale 
measures and the Rorschach indices of intel- 
lectual control included one statistically sig- 
nificant relationship. The measure of Cor- 
rected Increase in Self-Acceptance (Stress) 
was found to be negatively correlated at the 
.05 level with the F+% on the Rorschach 
color cards (r = — .38). This suggests that 
the Ss with lesser degrees of intellectual con- 
trol tend to present themselves as being more 
self-accepting under conditions of stress. 

No significant relationships were found be- 
tween Ss’ performance on the Digit-Symbol 
test and the measures of self-acceptance ob- 
tained under control and stress conditions. 
This finding neither supports the hypothesis 
that a major personality correlate of behav- 
ior under stress is the variable of self-accept- 
ance, nor does it indicate that the Berger 
scale can be used to predict performance un- 
der stressful conditions. 


DISCUSSION 


No significant relationships were found in 
the present study between performance un- 
der stress and the personality variables of in- 
tellectual control and self-acceptance. These 
Rorschach findings match those of Carlson 
and Lazarus, but differ markedly from Wil- 
liams’ results. 

The possibility that the experimental pro- 
cedures may have obscured significant rela- 
tionships merits further study. This concerns 
the practice of using a single score as a meas- 
ure of the S’s performance under stress which, 
in turn, is correlated with other scores repre- 
senting personality variables. A performance 
score may mask several important components 
and patterns of behavior. These may not only 
vary between Ss who achieve identical scores 
but, if separately correlated with selected com- 
ponents of the personality variables, could 
conceivably yield significant relationships (see 
the excellent discussion of this problem by 
Lazarus et al., 1952). 

The increased mean score made by the 
pledges taking the Berger scale under condi- 
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tions of stress may be interpreted as a de- 
fensive reaction. This is based on the premise 
that the pledges experienced psychological 
stress in anticipation that their personal char- 
acteristics were not as acceptable to the fra- 
ternity members as was indicated on the 
Berger scale under control conditions. From 
this viewpoint, rather than being solely a 
measure of self-acceptance, the increased mean 
scores include an emotional component of de- 
fensive behavior. 

No relationship was found between the 
Berger scale taken under control conditions 
and the Rorschach test. This finding does not 
support the initial hypothesis that greater de- 
grees of self-acceptance are associated with 
increased intellectual control. The significant 
negative correlation found between the vari- 
ables of Corrected Increase in Self-Acceptance 
(Stress) on the Berger scale and F+% on 
the color cards of the Rorschach test may be 
viewed as suggesting that the increased mean 
score of self-acceptance obtained under con- 
ditions of stress primarily reflects defensive 
behavior, and increased defensiveness by the 
pledges is related to correspondingly lesser 
degrees of intellectual control. It should be 
noted that this significant correlation may 
have arisen by chance, since it was one of a 
much larger number of relationships investi- 
gated in the study. 


SUMMARY 


The present study investigated the relation- 
ship between performance under stress and 
the personality variables of intellectual con- 
trol and self-acceptance. In an attempt to pro- 
vide a more valid and definitive test of these 
relationships, the Ss were presented with a 
realistic stress situation and had a common 
motivation to succeed. The behavioral cri- 
terion was decrement in performance on the 
Digit-Symbol test. Measures of intellectual 
control were derived from the Rorschach test, 
and the Berger scale was used to obtain meas- 
ures of self acceptance. The following re- 
search procedure was observed: (a) the Ror- 
schach test was administered to each of 30 
pledges, which took approximately 1 month; 
(6) the pledges then met as a group and com- 
pleted five practice trials on the Digit-Symbol 
test, and took the Berger scale under control 


conditions; (c) the following day the pledges 
met again as a group to complete six more 
trials on the Digit-Symbol the latter 
three serving as control and (d) 
taken to the ex- 
perimental laboratory to complete three more 
trials on the Digit-Symbol test and to take 
the Berger scale under conditions of stress. 

The major conclusions to be drawn from 
this study based on the experimental condi 
tions are as follows: (a) support is lacking 
for use of either the Rorschach test or the 
Berger scale to predict performance under 
stress, (4) the personality variables of intel- 
lectual control and self-acceptance do not ap- 
pear to be major correlates of behavior under 
stress, and (c) confirmation is lacking for the 
validity of the Rorschach constructs. The 
Rorschach findings are consistent with those 
of Carlson and Lazarus, who did not obtain 
comparable results in a duplicate study of 
that done by Williams. 


test, 
measures; 
they were then immediately 
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SOCIAL DESIRABILITY AND RESPONSE BIAS 
IN THE MMPI 


CHARLES HANLEY 


Michigan State University 


Two sources of individual differences in re- 
sponses to self-report inventories stand apart 
from traditional concepts of personality. The 
first is the degree to which subjects (Ss) are 
affected by the “social desirability” of atti- 
tudes expressed in inventory items. Scales 
that measure “defensiveness,” “plus-getting,” 
‘dissimulation,” amd “malingering” focus on 


this factor. The second is seen when Ss are 
influenced by the form of the answer sheet. 
Measures of “acquiescence,” 


“response set,” 
and “response bias” are concerned with the 
effects of response categories. Both kinds of 
measure, it is hoped, will ultimately be use- 
ful in suppressing personality scale variance 
irrelevant in diagnosis and screening. 

A recent study reported by Wiggins (1959) 
yields important information on a number of 
scales used for the MMPI. Eleven different 
measures, nine of which deal with some as- 
pect of social desirability, are compared and 
found to differ widely in ability to discrimi- 
nate between protocols of undergraduates in- 
structed to give the socially desirable answer 
to each MMPI item and protocols obtained 
under standard instructions. Wiggins draws 
conclusions from this study that raise general 
questions regarding past and future work with 
measures of social desirability. Wiggins dis- 
tinguishes two approaches to the measure- 
ment of test taking defensiveness; these differ 
in the manner in which scales are constructed 
and originally validated. A measure built to 
discriminate Ss given instructions aimed at 
maximizing defensiveness from Ss taking the 
inventory under normal conditions has been 
constructed by the “empirical” method and 
possesses “empirical” validity. A scale suc- 
cessfully devised to correlate in expected di- 
rections with diagnostic scales has been con- 
structed by the “rational” method and has 


“rational” validity. The two most effective 
measures in Wiggins’ study are both empiri- 
cal scales. Wiggins concludes that “empirical 
methods are the methods of choice” (p. 427) 
in constructing measures of social desirability. 
From analysis of correlations between the 
various measures, Wiggins suggests that ear- 
lier studies, presumably those using rational 
methods, “would be more appropriately con- 
sidered as studies of response bias” (p. 426). 
Finally, from reading the paper it is difficult 
to escape the impression that the empirical 
method of validation he employs is a close 
approximation of the real life screening and 
diagnostic situation. 

The purpose of the present paper is to ex- 
amine: (a) the degree to which effectiveness 
in his study is consistently related to the em- 
pirical-rational distinction as well as to other 
dimensions he has not considered, (6) whether 
the influence of bias in rational 
measures is as Clear as he suggests, and (c) 
whether empirical validation, when employed 
with specially instructed and standard groups, 
is free from defects specific to the procedure. 
The first point can be clarified by detailed 
examination of Wiggins’ data. The second 
and third require additional data obtained 
for the purpose. The analysis that follows is 
not intended to dispute the potential effec- 
tiveness in the real life situation of any spe- 
cific scale. but rather to consider general 
procedural questions that bear on the con- 
struction of useful measures of the influence 
of social desirability 


response 


CLASSIFICATION AND EFFECTIVENESS 
OF MEASURES 
Characteristics of measures of defensiveness 
can be illustrated by referring to eight scales 
studied by Wiggins. (Three others are omitted 
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TABLE 1 


CONSTRUCTION 


Item 
Validation Content 
Empirical Explicit 
Implic it 
Explic it 
Explicit 
Explicit 
None® 
Implicit 
None 


Empiri al 
None 
Rational 
Rational 
Empirical 
Empirical 
Rational 


* .683 on cross-validat 
> Labeled En in « 


¢ Except for eight items 


in the interest of simplicity; they play a 
minor part in his analysis.) These are L and 
K, both standard MMPI measures, Edwards’ 
SD (Fordyce, 1956), Ex (Hanley, 1957), 
Cof (Cofer, Chance, & Judson, 1949), Sd 
(Wiggins, 1959), Ds (Gough, 1954), and B 
(Fricke, 1957). All but the last are concerned 
with social desirability. 

Wiggins emphasizes the importance of 
method of validation. To it should be added: 
(a) use of the results of some type of judg- 
ment of item content in determining whether 
or not to include items in a scale, (4) use of 
response frequencies to determine inclusion or 
rejection of items, and (c) the original aim of 
the scale. Similarities and differences among 
the eight measures with respect to all of these 
variables are summarized in Table 1. 

Method of Validation. Table 1 indicates 
that empirical and rational procedures were 
used in the original validation of most of the 
scales. The L scale, however, was included in 
the MMPI without a validity study. 

Item Content. Item selection for several 
scales was wholly or partly dependent on ex- 
plicit judgments of item content. The L scale, 
for example, consists of items written to allow 
defensive individuals to claim unrealistically 
favorable traits. Edwards selected items for 
his SD measure after 10 judges gave socially 
desirable answers to a pool of F, K, and Tay- 
lor MAS items. Judgments of item content 
also played an important role in the construc- 
tion of the Sd and Ex measures. 

The Cof and Ds scales were derived in part 
by having certain Ss “fake” roles. These in- 


AND EFFECTIVENESS OF MMPI MEAsurRES IN WIGGINS 


Resp ynse 


Frequencies 


1959) Srupy 


Effectiveness 


phi coef 
Yes 
Yes 

Guessed 
Yes 
No 
Yes 
Yes 
Yes 


Defensiven¢ 721* 
619 
539 
461 
330 


217 


Defensiveness 
Defensiven 
Def. & Plus-get 
Defensiven 
Def. & Plus-get 
Dissin 


Response Bias 


ilation 


structions seem to involve implicit judgment 
of item content on the part of such Ss. Eight 
of the 30 K items were also chosen on the 
basis of results of a faking study (Meehl & 
Hathaway, 1946, p. 543). 

Item content was not considered in the 
derivation of Fricke’s B measure. Selection 
without attention to individual item content 
can be illustrated in the case of the 22 K 
items that constitute the L6 scale (Meehl & 
Hathaway, 1946). 

In brief, L6 was derived by an item analysis of the 
responses of 25 males and 25 females in the psycho- 
pathic hospital whose profiles showed an L score of 
T = 60 or more and who, with the exception of six 
normal cases, had diagnoses indicating the probabil- 
ity that they should have had abnormal profiles but 
whose profiles were in reality within the normal 
range (p. 540) 

The item responses of these fifty cases handled sepa- 
rately for males and females were compared to the 
male and female item frequencies from the general 
group of males and females that has been used in 
past scale derivations. In all, 22 items were chosen 
as a result of this comparison (p. 541). 

After these items had been selected, Meehl 
and Hathaway described them as giving an 
“over-all impression” of “impunitiveness” (p. 
541). That selection on the basis of item con- 
tent is not the same as interpretation follow- 
ing selection is indicated by the fate of Book- 
let Item 461, keyed “true” on Sd, Cof, and 
Ex but “false” on K. 

Response Frequencies. Several measures 
were derived wholly or partly by use of re- 
sponse frequencies obtained from groups tak- 
ing the MMPI. The quotations from Meehl 
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and Hathaway given in the preceding para- 
graph indicate the use of frequency data in 
selecting items for the L6 scale incorporated 
into K. To obtain the remaining eight K 
items, moreover, response frequencies also 
played a role. Items in the Ex pool were in- 
cluded only if 36-64% of Hathaway’s college 
sample had endorsed them. The Cof, Ds, and 
Sd scales were constructed in part by com- 
paring frequencies of endorsement obtained 
from various groups given special and stand- 
ard instructions. 

Response frequencies were not used in the 
construction of the SD measure, although 
some of its items were drawn from the K 
pool and thus remotely reflect response data. 
The authors of the Z scale did not inspect re- 
sponse frequencies for their 15 items but as- 
sumed that few honest persons could answer 
them in the socially desirable direction. 

Items for the B scale, a measure of re- 
sponse bias, were chosen entirely on the basis 
of response frequencies, only those endorsed 
by 40-60% of Hathaway’s normative sample 
being used. 

Aim. Most of the measures in Table 1 are 
oriented toward test taking defensiveness, the 
tendency to give socially desirable rather than 
personally relevant answers. K and Ex, how- 
ever, also aim at plus-getting, the tendency to 
be overly critical of oneself. The Ds scale is 
directed at the detection of deliberate plus- 
getting. The B scale, as indicated before, is 
aimed at response bias rather than defensive- 
ness. 

Effectiveness. Wiggins presents extensive 
data on relative effectiveness of the various 
scales in discriminating between a sample of 
250 college students instructed to give the 
socially desirable response to each MMPI 
item and a sample of 190 students taking the 
inventory under standard conditions. Wiggins 
determined mean scores separately for men 
and women. Scales that significantly differen- 
tiated records obtained under the two condi- 
tions were analyzed to estimate the degree to 
which this differentiation was accurate. His 
data, expressed as phi coefficients, are shown 
in the last column of Table 1. These are 
based on pooled male and female protocols. 

Using the categories in Table 1, we can ex- 
amine in order the qualities associated with 


effectiveness in Wiggins’ study. First is valida- 
tion. The LZ scale does well despite lack of any 
original validation, although it is by far the 
shortest scale studied. While the most effec- 
tive measures are the empirical Sd and Cof 
scales, the equally empirical K scale is the 
worst of the lot. Empirical validation, it ap- 
pears, has no systematic advantage over other 
types. 

A noticeable characteristic of effective meas- 
ures appears in the item content column of 
Table 1. The single defensiveness scale not 
consistently employing attention to item con- 
tent is K, which fares badly. 

Data on response frequencies are equally 
useful. The one defensiveness measure not 
using such information is SD, which is rela- 
tively ineffective. Even “guessed” response 
frequencies, as in the case of L, are better 
than none according to Wiggins’ results. 

Comparison of the Aim and the Effective- 
ness columns indicates scales designed to 
measure defensiveness may have some ad- 
vantage in Wiggins’ study over scales with 
broader aims. Ex, devised to measure both 
defensiveness and plus-getting, does not fare 
badly, but K, with similar aims, is ineffec- 
tive. Scales oriented toward behavior other 
than defensiveness, as in the case of Ds and 
B, are completely ineffective, a result that is 
not unexpected. 

In summary, the entries in Table 1 indi- 
cate that several characteristics distinguish 
between measures that were effective and in- 
effective in Wiggins’ investigation. Lack of 
attention to item content and response fre- 
quencies are more clearly associated with 
ineffectiveness than is the empirical-rational 
dimension. The advantage for empirical vali- 
dation is not as systematic as Wiggins’ con- 
clusions indicate. 


RATIONAL VALIDITY AND RESPONSE BIAS 


The superiority of Sd and Cof is based 
solely upon empirical validation. For Ex, K, 
and SD, Wiggins’ results reveal superior ra- 
tional validity, that is, higher correlations 
with MMPI diagnostic keys. Supporting his 
preference for the empirical approach is the 
suggestion that these correlations between de- 
fensiveness and diagnostic measures result 
from response bias. He presents additional 





16 Charles Hanley 


data showing rational measures to be highly 
correlated with Fricke’s B scale. These same 
data, however, indicate that the empirical K 
measure also correlates highly with B. Care- 
ful consideration of these results is needed. 

The measurement of response bias is based 
entirely on rational validity. Any scoring key 
with an imbalance of “true” and “false” re- 
sponses is expected to correlate with any 
other imbalanced key. When Wiggins reports 
a correlation of —.638 between K and Sc, for 
example, it can be attributed to response bias, 
because all but one of the 30 K items are 
keyed false, and 59 of the 78 Sc items are 
keyed true. The S set to answer true should 
get a high score on Sc and a low one on K. 
The reverse holds for the person biased to 
answer false. Individual differences in defen- 
siveness, however, lead to the same empirical 
result. 

In devising B, Fricke (1957) assumed that 
items of greatest “controversiality” (i.e., items 
yielding nearly equal numbers of true and 
false responses) are most susceptible to re- 
sponse bias. The B scale is composed of all 
MMPI items endorsed by 40-60% of Hath- 
away’s normal samples and not appearing on 
K. As Table 1 indicates, B is a rationai meas- 
ure constructed entirely from response fre- 
quencies. 

Securing adequate measures of response 
bias is made difficult by questions as to the 
existence of several such biases (Jackson & 
Messick, 1958; Hanley, 1959). If these prob- 
lems are set aside, however, another difficulty 
arises. Should items on the MMPI express 
undesirable traits more often than desirable 
ones, the use of response frequencies alone in 
item selection places the psychologist at the 
mercy of the manner in which the authors of 
the inventory worded their items. A scale 
based entirely on response data may have an 
excess of items describing undesirable charac- 
teristics. If this occurs, response bias and de- 
fensiveness are confounded. An individual will 
tend to obtain a low score, for example, by 
giving socially desirable answers to items. 
Correlations between the response bias meas- 
ure and defensiveness scales then would be 
partly due to the role of social desirability. 
This has been suggested regarding correla- 


tions between Fricke’s OAIS Set T scale 
(1956) and MMPI measures (Hanley, 1957). 

The correlations between B and defensive- 
ness measures are inconclusive if it can be 
shown that B is affected by social desirability. 
Extensive as Wiggins’ data are, additional in- 
formation is needed to settle this question. In 
the original study of Ex (Hanley, 1957), it 
was recognized that item imbalance might 
lead to contamination by response bias. For 
this reason, a second version, Sx, containing 
equal numbers of true and false responses, 
was described together with data showing 
that it correlated significantly with several 
MMPI diagnostic and validating scales. When 
social desirability was ignored and all Sx 
items keyed true to give a measure of re- 
sponse bias (AT), correlations were obtained 
with several MMPI keys in the predicted di- 
rection. Sx and AT scores, however, were not 
significantly correlated. From both sets of 
correlations, it was concluded that many 
MMPI measures were influenced by both 
response bias and defensiveness. 

B has correlations of .49 with AT and —.33 
with Sx, computed from the protocols of Han- 
ley’s 1957 sample. Both coefficients are sig- 
nificant at the 1% level. These results suggest 
that B is influenced by defensiveness as well 
as response bias. More direct evidence on this 
point, however, is obtained from judgments of 
the social desirability of the items comprising 
the B scale. 

Social desirability judgments were available 
for 25 B items from the earlier study (Hanley, 
1957). The remaining 38 B items, together 
with three markers that help define the ex- 
tremes and middle of a nine-point social de- 
sirability rating scale, were rated by 26 male 
and 33 female Michigan State University stu- 
dents in two undergraduate child psychology 
sections. Social desirability of an item is de- 
fined by its median rating. As in the earlier 
study, items with values of 4 or less were 
categorized as undesirable and those rated 6 
or more desirable, while items with medians 
between 4 and 6 were treated as neutral. 

Of 63 B items, 21 were judged undesirable, 
32 neutral, and 10 desirable. The scale, it 
appears; has an imbalance of socially undesir- 
able items. 

Another way to demonstrate the imbalance 
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TABLE 2 


NTERNAL CONSISTENCY RELIABILITY 
or SUBSCALES or B 


1 Undesir 


‘able Ite 
100 Men 576 
68 Wome 653 


Reliability corre: 


100 Men 674 
68 Women 741 


is to consider reliable variance. If the hypothe- 
sis is correct, undesirable and desirable items 
will be influenced by two sources of syste- 
matic variance: social desirability and re- 
sponse bias. Neutral items will be unaffected 
by social desirability. Neutral items, therefore, 
should contribute less variance to the B scale, 
provided allowances are made for difference in 
number of items involved. 

To test this hypothesis, the B scale was 
broken into homogeneous subscales of unde- 
sirable, neutral, and desirable items. Kuder- 
Richardson Formula 20 reliabilities computed 
for the three subscales are shown in Table 2. 
The Ss were 100 males and 68 females, who 


in 1955 had taken the MMPI in introductory 
psychology classes at Michigan State Uni- 
versity. 

Empircial reliabilities are given in the upper 


half of Table 2. Since the subscales differ 
markedly in length, these values must be cor- 
rected to make comparisons meaningful. We 
ask, therefore, what reliabilities would be ex- 
pected if all subscales consisted of 32 items. 
The entries in the lower half of Table 2, 
computed by the generalized Spearman-Brown 
formula (Guilford, 1954, p. 354), answer this 
question. The desirable and undesirable items 
have greater internal consistency than the 
neutral ones, a result in agreement with the 
hypothesis that these two subscales contain 
variance associated with social desirability. 
Social desirability in B can be shown in yet 
another way. B has an internal consistency of 
.628 in these women and .647 in the men. By 
keying responses to the 10 desirable items 
false and scoring all others true, the role of 
social desirability is increased at the expense 
of response bias. The internal consistencies of 


the revised measure are .673 for the women 
and .640 for the men, results again demon- 
strating that B is 


ability. 


affected by social desir- 


that rational validities of 


certain defensiveness measures should be con- 


The conclusion 


sidered the result of response bias must be 
strongly qualified whenever it is based on 
correlations involving B. To devise a satis- 
factory measure of the hypothetical general 
response bias to inventory items, one should 
use judgment of content to eliminate an im- 
balance in socially desirable and socially un- 
desirable items. 


EMPIRICAL VALIDATION 


The third aim of the present study concerns 
the extent to which validation of the kind em- 
ployed by Wiggins risks incorporating vari- 
ance specific to the procedure. Such variance 
will be irrelevant to defensiveness as it occurs 
in diagnostic and screening situations in real 
life. A clue to one type of such specific vari- 
ance is given by the fact that the Z scale is 
one of the more effective measures in Wiggins’ 
study. A likely source of such specific variance 
arises in items that are obviously measures of 
defensiveness and whose keying as such is 
transparent. An “obvious” item, to borrow 
Wiener’s (1948) term, is one a sophisticated 
individual will recognize as a trap for defen- 
siveness.' The L scale is thought to suffer from 
such obviousness: “At least, one may conclude 
that the intent to deceive is not often detect- 
able by ZL when the subjects are relatively 
normal and sophisticated” (Meehl & Hath- 
away, 1946, p. 538). Obviousness, however, 
is undesirable only if keying of responses is 
transparent. When defensive Ss identify an 
item as pertaining to defensiveness, but think 
that the nonkeyed response is the critical one, 
the item remains effective. There may exist, 
therefore, obvious items worthless in practical 
use because their scoring is transparent, obvi- 
ous items useful because their scoring is dis- 


1 An ‘item may be “obvious” for some purposes but 
“subtle” for others. In Wiener’s study, for example, 
the item “I am happy most of the time” is considered 
a subtle measure of Pa but an obvious measure of 
Hy. Obvious defensiveness items, in the same way, 
may be subtle on other scales 
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TABLE 3 


PRopoRTIONS OF ITEMS RECEIVING DIFFERENT 
NuMBERS OF “Osvi0US” JUDGMENTS 
FROM 48 JUDGES 


Number of “Obvious”’ 
Judgments Received 


Scale 36-23 21-16 


15-9 8-0 


L 40 33 .20 
Sd 30 .22 .20 
Cof .26 .29 18 
Ex 19 42 27 
K 13 30 30 
All Items 25 24 .26 


guised, and items so subtle that scoring is no 
issue. 

In validating procedures of the instructed 
vs. standard groups variety, obvious-transpar- 
ent items are as effective as any others in dis- 
tinguishing between instructed and control Ss. 
Ss asked to give the socially desirable answer 
or to fake a role should do so with obvious as 
well as subtle items. Controls taking the in- 
ventory under standard instructions should 
avoid defensive answers to obvious-transpar- 
ent items. A scale derived from comparison of 
the two groups ought to be effective in similar 
investigations, but many of its items may 
prove useless in real life applications. For this 
reason, performance in Wiggins’ study alone 
is an unsatisfactory standard against which to 
judge various methods of constructing meas- 
ures of test taking defensiveness. 

To determine proportions of obvious items 
in the empirical and rational scales, 18 male 
and 30 female students in the sections used 
3 weeks earlier for judgments of the B scale 
each selected the 30 to 40 items most obvi- 
ously measuring defensiveness from a list of 
the 103 items on K, L, Ex, Conf, and Sd. 
Next, they gave the defensive answer to every 
item they had chosen. From their choices 
come two kinds of information: obviousness of 
each item and transparency of its scoring. 

Obviousness of Items. Data on obviousness 
appear in Table 3. The 103 items, several of 
which occur on more than one measure, are 
grouped very nearly into quartiles according 
to the number of obvious judgments received 
from the 48 judges. The LZ scale clearly is 


composed of a relatively large number of ob- 
vious items. K, on the other hand, is least 
obvious, a finding that supports the authors 
of the MMPI in their belief that K is the 
subtler measure. 

The other three measures fall between the 
two extremes. The empirical Cof and Sd scales 
have a greater proportion of extremely ob- 
vious items than the rational Ex measure. If 
the first two columns of Table 3 are pooled, 
however, the advantage for Ex disappears. At 
this point, data on judges’ agreement on the 
defensive answer are relevant. Of the 52 items 
receiving 16 or more judgments of obvious, 
answers to 8 were confused to the extent that 
one-fourth or more of the judges disagreed 
with the majority. One Cof and four Sd items 
were subject to such extensive disagreement, 
but the majority of judges in each case chose 
the keyed response. One K item was so af- 
fected, but the majority answer was wrong. 
Of three Ex items disagreed on, only one was 
answered in the keyed direction by the ma- 
jority. K and Ex, it seems, are even less af- 
fected by item obviousness than Table 3 in- 
dicates. 

Transparency of Scoring. Most items judged 
obvious were answered by the average judge 
as keyed on individual scales; nevertheless, 
some were answered in the nonkeyed direction 
(i.e., judges thought the “honest” answer was 
the defensive one). For a systematic study of 
this behavior, responses to all items receiving 
nine or more judgments of obvious were an- 
alyzed—those with fewer are so subtle that 
transparency is no issue. 

The results of this analysis may be ex- 
pressed by a ratio of incorrect to correct aver- 
age identifications of the keyed response. With 
K, for example, the ratio is 7/15—7 incorrect 
and 15 correct identifications out of 22 items 
receiving 9 or more obvious judgments. Ratios 
for the other scales are: Ex 5/18, Cof 1/24, 
Sd 3/26, and L 0/15. These data demonstrate 
that the most effective scales in Wiggins’ study 
tend to be more transparent in scoring than 
is the case with those he found less useful. 

Subtle Items. While the empirical method 
used by Wiggins and by Cofer et al. is prone 
to include items undesirable in a measure of 
defensiveness, the data in Table 3 indicate 
that it uncovers a fair number of subtle items. 
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Prominent among those falling into the lowest 
quartile of obviousness are items whose con- 
tent includes the words “I like.” Some 15 of 
the 103 items contain this expression, and 10 
of these are among the subtlest. An example 
is the item: “I would like to be a soldier.” 
Five subtle “like” items appear on Cof, six on 
Sd, one on K, and none on L and Ex. Lorge 
(1937) discovered response bias on the like 
items on the Strong VIB, and Hanley (1959) 
has presented evidence of a set specific to such 
items rather than to items in general. A de- 
fensiveness measure with many like items can 
be affected by individual differences in the 
specific response set. For this reason, subtle 
like items may be undesirable in a measure 
of defensiveness. 

Despite proneness to a specific set, like 
items may prove useful in the screening and 
diagnostic situations. It is possible that Ss 
attempting to portray themselves in an overly 
favorable manner tend to like almost every- 
thing. If this is true, there is no objection to 
the use of such items, save for the reservation 
expressed above. 


DISCUSSION 


The results of Wiggins’ study show high 
empirical validity for the Sd and Cof meas- 
ures. Rather than presenting his findings as 
only a validation and cross-validation of these 
particular scales, Wiggins has taken the more 
constructive path of raising general methodo- 
logical questions that relate to all measures of 
social desirability. The danger arises that the 
success of his scale may lead to an uncritical 
acceptance of the methodological analysis in 
which he employs the concepts of empirical 
validation and response bias to account for 
different efficiencies and correlations in his 
comprehensive sample of defensiveness meas- 
ures. By use of his own extensive data, how- 
ever, it has been possible to show that method 
of validation is less systematically related to 
effectiveness than is selection by attention to 
item content and response frequencies. The 
relatively ineffective measures in his study, 
the empirical K and the rational SD scales, 
lacked one of these two selection criteria in 
their construction. 

Wiggins properly points to the possible con- 
tamination of rational scales by response bias, 


but an empirical method has produced one 
scale, K, that is probably so affected. New 
data on defensiveness in the B scale, which he 
used to measure response bias, indicate that 
it is affected by social desirability. While con- 
struction of rational measures certainly should 
aim at balancing items to eliminate response- 
bias contamination (Hanley, 1957), the em- 
pirical methods employed by Wiggins and by 
Cofer et al. appear from data presented in this 
paper to suffer the limitation of including 
items that may be too obvious to be useful in 
real life measurement of test taking defen- 
siveness. 

The data that indicate excellent discrimina- 
tion for the empirical Sd and Cof measures 
show good discrimination for the rational L 
and Ex scales. The L scale is short, and the Ex 
measure was originally presented as a meth- 
odological demonstration rather than as a 
practical instrument (Hanley, 1957). In view 
of the success of these four scales, it should 
be emphasized that both empirical and ra- 
tional methods can work in the contrasted 
groups’ approximation to the real-life situation. 
(Correlations among these measures in Wig- 
gins’ sample of control men demonstrate that 
Sd and Cof do not form a pair clearly distin- 
guished from the other measures. Cof and Ex, 
for example, correlate more highly than Cof 
and Sd, despite a 14-item overlap in the latter 
two scales.) 

Validation by contrasted groups, however, 
remains only an approximation to screening 
and diagnostic performance. For this reason, 
Wiggins’ results do not foreclose the possibil- 
ity that a seemingly ineffective scale may be. 
useful in actual practice. For any defensive- 
ness measure to aid in screening and diagnosis, 
moreover, one condition must be met: if a 
linear regression model is used, the defensive- 
ness scale must correlate with the diagnostic 
measure, that is, a scale cannot suppress irrel- 
evant variance in a predictor unless it corre- 
lates with the predictor. Rational scales seem 
more to meet this requirement than do em- 
pirical measures, save for K. The correlation 
of —.091 between Sd and Sc reported by 
Wiggins, for example, means that Sd cannot 
suppress variance in Sc that is associated with 
defensiveness. While it is possible to hold that 
the role of defensiveness in responses to in- 
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ventories is seriously exaggerated—that the 
low correlation is the outcome of honest an- 
swers to Sc items—so many psychologists have 
assumed defensiveness operates to reduce the 
effectiveness of inventories that this plausible 
but radical hypothesis needs verification in 
studies of protocols obtained from patients 
and controls in screening and diagnostic set- 
tings. Until then, no method can rightly be 
termed “the method of choice.”’ 

The derivation of useful suppressor scales is 
not the only concern in studies of social de- 
sirability; there ought to be some explanation 
of failure and The methodological 
considerations raised by Wiggins and by the 
present paper are important for this reason. 
Wiggins indicates what he believes is the im- 
portant dimension; the present paper presents 
alternatives that fit his results. The final reso- 
lution of these differences, however, rests on 
studies as extensive as that of Wiggins but 
conducted with actual rather than simulated 
protocols. The methodological analyses indi- 
cate dimensions that should be explored in 
such a study. 


SUCCESS. 


SUMMARY 


MMPI scales related to social desirability 
differ in use of response frequencies and atten- 
tion to item content in selection of individual 
items. A reinterpretation of data from an 
extensive study by Wiggins (1959) indicates 
that scales using both response frequencies 
and judged item content in their construction 
are superior at discriminating controls from 
subjects instructed to respond to the MMPI 
in a socially desirable manner. Whether scales 
were originally validated by the “empirical” 
or the “rational” method is less systematically 
related to their effectiveness. 

The role of response bias in producing cor- 
relations between rational scales and MMPI 
diganostic measures is unclear because of the 


Hanley 


difficulty of obtaining a pure measure of re- 
sponse bias. The B scale employed by Wiggins 
to measure response bia 
social desirability. 


is also influenced by 


Derivation by contrasted groups, the em- 
pirical method used by Wiggins, suffers from 
the fact that it includes many items that are 
obvious measures of defensiveness and whose 
scoring is transparent. 

Preference for empirical or for rational pro- 
cedures should await studies of their effective- 
in real life and 
situations. 


ness diagnostic screening 
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AN ANALYSIS OF FIGURE 
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When a patient produces a rotation or re- 
versal for a stimulus figure during a psycho- 
logical test, this is interpreted as a sign of 
brain damage. For instance, higher weighted 
scores for brain damage are assigned for rota- 
tions than for any other kinds of errors in the 
Graham-Kendall (1946) Memory-for-Designs 
Test (G-K). During routine administration of 
that test at this hospital, it has been found 
that an occasional intelligent patient produces 
a rotation without making any other scored 
error. A correctly reproduced figure, although 
rotated or reversed, seems on inspection to 
indicate better cerebration than a reproduc- 
tion in which the gestalt is changed or other 
kinds of errors are made for the same figure. 
So, since brain damaged persons produce rota- 
tions to a significantly greater extent than do 
others, it was hypothesized that such behavior 
might have significance also in another way. 
Could it represent an element of bluffing, a 
hysterical maneuver, a form of role-playing to 
overdemonstrate brain’ injury, contrariness, 
lack of interest, distraction, transient physio- 
logical disturbances in the brain? Inspection 
of a few handy cases with rotation suggested 
the last as a likely possibility. 


PROCEDURI 


Of the patients who had had electroencephalo 
grams (EEGs) all the G-K protocols at this hospital 


t Kenneth A. Kooi, now at the University of Mich 
igan Medical School, turned over to the author com- 
prehensive EEG data which were used in this study 
Reed S. Boswell, research psychologist at the Veterans 
Administration Hospital in Salt Lake City, rechecked 
G-K scores based on rotations made by the original 
examiners and gave useful suggestions. Leonard W 
Jarcho, Chief of the Neurology Service of this hos- 
pital, and Chairman of the Division of Neurology of 
the University of Utah College of Medicine, critically 
evaluated the results and interpretations and made 
helpful suggestions for the manuscript 
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containing rotations reversals were collected to 
find out if a patterning of some kind might emerge 
when these protocols were compared as a group with 
other data in the clinical files. Almost at once the 
observation was made that there may be a loading 
for epilepsy. There were 42 protocols with rotations 
or reversals among a total of 
tients who had also had EEGs 
Between 25-50%, varying from month to month, 
all patients admitted to this hospital since it 
opened in 1952 have received EEGs. Cases have been 
referred primarily for EEGs when shock therapy was 
contemplated, when epilepsy or brain damage was 
considered a possibility or was known, when patients 
were regarded as alcoholics, as elderly psychotics, as 
special diagnostic problems, etc. When a patient was 
referred for both an EEG and a psychological evalua 
tion, both referrals were usually made at about the 
same time 

All the protocols were separated into two major 
groupings based on EEG summary impressions of the 
electroencephalographers. The criterion for the first 
grouping was “normal” or “within normal limits.” 
This contained 129 and was set aside. The 
second major grouping, containing 209 cases, was 
broken down into two groups for the analysis. All 
had “abnormal” or “borderline abnormal” records 
such as “generalized slow patterns,” or “slow alpha,” 
When, in addition, mention was made of the 
presence of transient episodic disruptions of the ab 
normal background or usual patterns, the case was 
placed in Group I. Notations for these were 
such as “paroxysmal formations,” or “scattered sharp 
formations,” or “spikes,” “occasional delta,” 
This group contained 82 cases. Group II was com 
posed of the 127 with abnormal EEGs 
without any notations of transient disturbances 
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RESULTS 


When Group I was compared with Group II 
for frequency of rotations, the x* was 19.67, 
which is beyond the .001 level of confidence 
The r;,, was .57. When Group I was compared 
with the normal group or a combination of 
this and Group II, the relationship was higher, 
as there were only three cases in the normal 
grouping with rotation. Group I contained 28 
such cases 
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When the cases were rearranged according 
to diagnoses and without regard to EEG find- 
ings, relationships turned out to be appre- 
ciably weaker. A group of 77 patients with the 
diagnosis of epilepsy was compared with a 
group of 136 with the diagnosis of brain 
damage, for incidence of rotation. The x? fell 
down to 6.47, which reaches only the .02 
level of confidence, and the r;,; was only .33. 


DISCUSSION 


One explanation for the drop in relation- 
ships when comparisons were made according 
to diagnoses, is that often an “organic” has 
seizures of some kind without any mention of 
this fact in the diagnosis, perhaps because the 
seizures are regarded by a diagnostician as a 
minor symptom. Another is that seizures may 
not have been observed by trained profes- 
sional personnel. Still another is that a patient 
may never have had a recognized seizure. At 
any rate, the data demonstrate that rotations 
or reversals of G-K figures are associated with 
transient, episodic EEG disturbances, and 
they suggest that rotations may be a sign of 
epilepsy or a potential epileptic condition, 
possibly a subclinical manifestation. 

An interpretation of rotation could be as 
follows: the subject correctly perceives the 
stimulus figure, but by the time he starts to 
draw the reproduction, some kind of transient 
physiological dysfunction in the brain has oc- 
curred, altering his memory of it. There were 
five rotation cases among the whole series, 
each with a total error score of 3 on the G-K, 
these scores being based on one rotation and 
with no other scorable errors. This would sug- 
gest that each person correctly perceived all 
the figures, but had a crucial interference of 
vigilance while negotiating one of them. 

Frequently an epileptic patient, while doing 
the G-K test, might announce that he had 
forgotten the design. Alternatively, he might 
reproduce it incorrectly and then do it again 
correctly without prompting. So far tabula- 
tions for these behaviors have not been made 
here. When confronted with an incorrect re- 
production after the test was over, an epileptic 
subject frequently recognized the error and 
explained it in terms of momentary confusion, 


or would say that he had been distracted by 
thinking of something else, or was not paying 
enough attention. Prior to the confrontation, 
most subjects when asked could correctly 
point out reproductions which were inaccu- 
rate, suggesting an awareness of transient con- 
fusion. Now and then a subject might make a 
rotation or a marked error in an otherwise 
accurate record, and when asked after the 
test to find the error, would not only locate it 
but reproduce it correctly without reviewing 
the stimulus card. 

Somewhat comparable phenomena in exam- 
inations with the Wechsler Adult Intelligence 
Scale have already been reported (Hovey & 
Kooi, 1955; Kooi & Hovey, 1957). MMPIs 
administered to most of the same subjects in 
those studies also tended to produce character- 
istic profiles for epileptics (Hovey, Kooi, & 
Thomas, 1959). 

The majority of the diagnosed epileptic 
group and also the majority of the group with 
episodic EEG features had known brain dam- 
age. Only 2 of the 42 rotation cases had 
neither an abnormal EEG nor a diagnosis 
implying organicity. 

Chorost, Spivack, & Levine (1960) report 
that rotation of Bender-Gestalt figures by 
children was slightly associated with EEG 
abnormality but not enough for predictive 
purposes. The difference between my results 
and theirs could be explained by the finding 
that children generally have less ability than 
do adults to execute the drawing of designs 
(Pascal, 1951, pp. 23, 42). Therefore control 
groups of children might be expected to have 
a relatively high incidence of rotation. The 
adult groups used in the present study con- 
tained much smaller proportions of rotation 
than did theirs, approaching the vanishing 
point for subjects with normal EEGs. Further- 
more, the present project used 45° instead of 
their 30° criterion for rotation. However, di- 
rect comparisons between the two studies can- 
not be made since standard administration of 
the Bender figures permits continuous refer- 
ence to the stimulus figures, whereas a memory 
factor is involved in the G-K administration. 
The current observations are consistent with 
the prevailing opinion that rotation is asso- 
ciated with organic disease of the adult brain. 
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SUMMARY 


The performance on a design reproduction 
test of a group of patients with transient 
episodic EEG features was compared with a 
group having abnormal EEGs but without 
observed episodic features. The episodic group 
made rotations to a significantly greater extent. 
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Recent research has suggested that role 
playing or empathic ability is related to gen- 
eral adjustment. Working with college popula- 
tions, a number of investigators (Dymond, 
1948, 1949, 1950; Lindgren & Robinson, 
1953; McClelland, 1951; Norman & Ains- 
worth, 1954) have found that better adjusted 
students play roles and empathize with greater 
facility than those who are less well adjusted. 
By logical extension it might be assumed that 
“normals” generally are more skilled in this 
function than neurotics and psychotics. Some 
very recent studies, however, have shown that 
certain schizophrenic groups have considerable 
role playing skill. 

Jackson and Carr (1955) reported, for ex- 
ample, that their normal controls demon- 
strated greater empathic ability than schizo- 
phrenic patients; their schizophrenic sample, 
however, was quite heterogeneous, some pa- 
tients consistently demonstrating more em- 
pathic ability than a number of controls. 
When Helfand (1956) compared the empathic 
ability of four groups—normals, nonpsychotic 
patients (tuberculous), and chronic and priv- 
ileged schizophrenics—privileged schizophren- 
ics proved to be superior to all others includ- 
ing normals. Some related information was 
produced by Grayson and Olinger (1957), 
who found that when asked to simulate “nor- 
malcy,” most of their psychiatric patients 
(largely schizophrenics) were able to improve 
their test performance and that improvement 
was related to early discharge from the hos- 
pital. 

1 This investigation was supported by a research 
grant (M-1529) from the National Institute of Men- 
tal Health, National Institutes of Health, United 
States Public Health Service. 


Presented, in part, at the meetings of the Western 
Psychological Association, San Diego, April 17, 1959. 
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The present study attempted to throw fur- 
ther light on role playing in schizophrenia. On 
the basis of previous research, the following 
hypotheses were formulated: 

1. Acutely ill schizophrenics are better able 
to play the normal role than chronically ill 
schizophrenics. 

2. Whether acutely or chronically ill, schiz- 
ophrenics who subsequently improve are bet- 
ter able to play the normal role than those 
who do not. 


METHOD 
Subjects 


The subjects (Ss) of the study were 54 hospitalized 
women diagnosed as either acute or chronic schiz 
ophrenics. Each of these groups was further 
divided into fast and slow improvement subgroups. 
Half of the total sample was Caucasian. The re- 
mainder was Oriental part-Hawaiian with the 
majority being Japanese 

The acutely ill group was made up of 25 patients, 
none of whom had a history of prior hospitalization 
beyond that associated with usual commitment pro- 
cedures. In addition, suddenness of onset in a pre- 
viously compensated personality structure was also 
used as a criterion which determined inclusion in this 
group. Within the acute group, 12 patients were con- 
sidered to be in the fast improvement subgroup and 
13 in the slow improvement one. This classification 
was based on an evaluation of the hospital course 
over the 6 months following the testing of the last 
case. All of the patients of the fast improvement sub- 
group had been discharged as improved or recovered 
Their range hospitalization was from 1 to 
months with an average of 2.6 months. In the siow 
improvement subgroup, seven patients had been dis- 
charged; six remained in the hospital. Length of 
hospital stay for these Ss ranged from 6 to 20 ménths 
with an average of 10.9 months. In general, recovery 
in the slow improvement subgroup was not only less 
rapid, but it was also qualitatively less striking 

The chronic group was made up of 29 patients who 
had either been continuously hospitalized for 2 or 
more years or had been admitted at least twice and 
had a history of long-standing schizophrenic adjust- 
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ment. Within this group, 13 patients were placed in 
the fast improvement subgroup and 16 in the slow 
one. All of the patients of the fast improvement sub- 
group had been discharged as improved. Their length 
of hospitalization (including all former periods) 
ranged from 8 to 65 months with an average of 29.2 
months. In the slow subgroup, no patients had been 
discharged or seemed ready for even preliminary con- 
sideration for discharge. Length of hospital stay for 
these Ss ran from 12 to 180 months with an average 
of 56.3 months 

Prior to inclusion in the study, patients under con- 
sideration were administered the Vocabulary subtest 
of the Wechsler-Bellevue, Form I, and only testable 
patients with a weighted score of seven or higher 
were used. None of the groups or subgroups differed 
significantly from others on vocabulary score or 
amount of formal education. The average number of 
school grades completed was 10.5. The chronic groups 
averaged 34 vears old; the acute groups’ average age 
was 30.5 


Procedure 


In studies of role playing and empathy, the S is 
usually asked to predict the response of another 
person who is in some way known to the S. This 
procedure has been criticized by a number of investi- 
gators. Hastorf and Bender (1952) emphasized that 
projection rather than empathy may account for part 
of the prediction of another person’s responses. Lind- 
gren and Robinson (1953) pointed out that instead 
of truly empathizing, S may respond in terms of a 
stereotype; and Helfand (1956) indeed found that 
his normals tended to rely on a conventional frame of 
reference although this was not true for his schizo- 
phrenic groups, who were deficient in such a ref- 
erence. 

Some investigators have explicitly undertaken to 
assess their Ss’ awareness of normative data rather 
than their awareness of a specified criterion individ- 
ual. Indeed, Crow (1959) suggested that when judges 
are asked to predict personality characteristics of 
criterion Ss, their judgments are more accurate if they 
are based upon stereotypes than if they are based on 
specific information about each criterion S. In Crow’s 
study, a variety of judges (student nurses, medical 
students, psychiatric residents, and others) were 
asked to estimate the age, intelligence, vocabulary 
level, and personality characteristics of 10 medical 
patients, based upon their seeing a 6-minute sound 
movie of each patient being interviewed by a physi- 
cian. In addition, the judges were asked to make 
estimations for the “average patient.” On the basis of 
these two kinds of judgments, it was possible to com- 
pute a stereotype accuracy (subtracting a 
judge’s estimation for the average patient from each 
of the criterion scores) and an individual accuracy 
score (subtracting a judge’s estimation for each pa- 
tient from that patient’s criterion score). Crow found 
that stereotype accuracy was clearly more accurate 
for estimation of personality characteristics; that is, 
the judges would have been more accurate if they 
had given their estimation for the average patient 
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each time instead of making an individual prediction 
for each patient 

The procedure used in the present investigation was 
similar to Crow’s and to the one used by Grayson 
and Olinger (1957) in that awareness of normative 
data was measured, that is, “abnormal” Ss were asked 
to play or simulate the “normal” role. Each S was 
tested in two sessions, the first session in the morning 
and the second in the afternoon of the same day. In 
the case of the acutely ill group, the testing was ac- 
complished within 1 week after hospitalization. In the 
first session, the Rorschach and the Sc scale of the 
Minnesota Multiphasic Personality Inventory were 
given under standard instructions. In the second ses 
sion, these two tests were administered again with 
special role playing instructions to the S to respond 
in the way that a “typical, average, ordinary” per- 
son would. The instructions were repeated with each 
Rorschach card and wherever it seemed indicated on 
the MMPI Sc. The word normal itself was not used 
because preliminary investigation revealed that this 
term provoked negative reactions on the part of 
some patients 

Each Rorschach protocol was scored for the “prin- 
cipal indicators of schizophrenic disorganization” de- 
scribed by Schafer (1948). These indicators include 
low form level, use of pure color, sex responses, sud- 
den changes, irregular sequence of locations, and vari- 
ous types of deviant verbalizations. Every indicator 
was given a score of one point each time it appeared 
except that F+% between 50 and 59 was given a 
score of 1, and under 50 a score of 2. For each proto- 
col, the total number of indicators was divided by 
the number of responses to yield a schizophrenic 
disorganization quotient which took into account 
the productivity of the S. Statistical analysis of the 
Rorschach results was based on this quotient. The 
MMPI Sc was scored in the usual manner. 


RESULTS 


The Rorschach schizophrenic disorganiza- 
tion indices and the MMPI Sc scores for the 
various groups under the two experimental 
conditions are presented in Table 1. On both 
the Rorschach and MMPI Sc, high scores 
were regarded as evidence of schizophrenia 
and reduced scores under role playing in- 
structions were considered to be evidence of 
ability to play the normal role. 

On the Rorschach, all experimental groups 
showed some reduction in sign of schizo- 
phrenic disorganization under role taking 
conditions. The acute group significantly re- 
duced its quotients, demonstrating a de- 
creased schizophrenic disorganization when 
playing the normal role. The fast improve- 
ment group similarly reduced its quotients. 
However, the degree of reduction of the signs 
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TABLE 1 


INDICES OF SCHIZOPHRENKI 


DISORGANIZATION ON THE 


Rorscuacn Test AND MMPI Sc Scate 


UNDER STANDARD AND ROLE PLAYING INSTRUCTIONS 


Rorschach Test 


Disorgan- 


Disorganization 
Indicators 


Mean SD 


Acutely Ill (WV = 25 
Standard 
Role Playing 
Chronically Ill (V = 


Standard 
Role Playing 


Fast Improvement (N = 25 
Standard 
Role Playing 
Slow Improvement (V 29 
Standard 
Role Playing 


* Significant at .02 level 
** Significant at .01 level. 


of schizophrenic disorganization in the chronic 
group and in the slow improvement group 
was slight and did not achieve significance. 

None of the MMPI Sc results achieved sta- 
tistical significance. It is interesting to note, 
however, that both the acutely ill and fast 
improvement groups appeared to reduce their 
MMPI Sc scores in assuming the normal role, 
while both the chronically ill and slow im- 
provement groups appeared to increase their 
scores on the same test under role taking 
conditions. 

The statistical tests of the two experimen- 
tal hypotheses are presented in Table 2. All 


TABLE 2 


ANALYSIS OF VARIANCE OF DIFFERENCES BETWEEN 
Test PERFORMANCE UNDER STANDARD AND ROLE 
PLAYING INSTRUCTIONS 


Rorschach MMPI 


Source of Variance MS PF 








Chronicity of Illness 
Rapidity of Improvement 
Interaction 

Within Groups 


592.24 
180.08 
16,253.83 
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ization 
Quotient 


MMPI S« 


Scale 


Mean SD 


16.64 
16.28 


14.83 
17.34 


of the results are in the predicted direction, 
but only one achieved statistical significance. 
‘The acute group improved its Rorschach per- 
formance significantly more in playing the 
normal role than did the chronic group. 


DISCUSSION 


Studies of changes in Rorschach protocols 
between two test administrations in the ab- 
sence of instructions to play a specified role 
suggest that signs of psychopathology visible 
on the first protocol are equally clear on the 
second. Griffith (1951) tested a sample of 
four patients with a diagnosis of Korsakoff 
syndrome whose memory was sufficiently im- 
paired so that they did not recall the first test 
situation when repeating the Rorschach 1 
day afterwards. In his summary of these four 
cases Griffith comments that autistic original 
percepts reliably characterized the individual 
and that this kind of responsivity is consist- 
ent on both protocols. Holzberg and Wexler 
(1950) studied a group of schizophrenics who 
were tested twice with the Rorschach test 
with an interval of 3 weeks between adminis- 
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trations. Significant statistical changes could 
not be demonstrated for any Rorschach fac- 
tor from test to retest. They suggested that 
chronic schizophrenics are highly consistent. 
These studies indicate that one can reason- 
ably expect reliable performance from test to 
retest even within a psychotic population in 
the absence of special role taking instructions. 
The description of the schizophrenic as a 
person lacking a concept of a “generalized 
other,” offered by Sarbin (1943) and elabo- 
rated by Helfand (1956), is consistent with 
the performance of the chronic schizophrenics 
of the present study. This description is less 
appropriate, however, for the acute schizo- 
phrenics since the Rorschach results suggest 
that this group has some conception of the 
normal role and can differentiate it from 
schizophrenia. This is particularly true of 
those acute schizophrenics who subsequently 
showed rapid clinical improvement. 
Comparison of the results on the Rorschach 
and MMPI indicates that the Ss of the study 
were able to reduce their schizophrenia scores 
on the former but not on the latter. It might 
be assumed that a well-structured task such 
as the MMPI Sc scale would prove more re- 
sponsive to role playing than a less structured 
one such as the Rorschach. Scores obtained 
by the experimental group were in the range 
typically found by other investigators when 
studying schizophrenics, which suggests a cer- 
tain degree of validity in the present findings 
insofar as the standard instruction MMPI is 
concerned. Sorting the 78 cards proved to be 
a long and laborious task, however, and it was 
not easy to maintain all Ss’ interest in the test 
or to ensure a proper set. Furthermore, a 
number of items were found to be worded in 
a complex and possibly confusing fashion. 
(Example: “I do not often notice my ears 
ringing or buzzing.’’) Answering such items 
under standard instructions seemed difficult 
for a number of Ss; with the added operation 
required under role playing instructions, the 
difficulty seemed to be compounded. Whether 
the present results reflect these difficulties, 
which might have led to random sorting on 
the role playing task, or whether the results 
suggest a true inability of schizophrenics to 
predict how normals would respond to these 
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test items, remains an unanswered question 
here. 

Because results with the MMPI in the pres- 
ent study are at variance with the results re- 
ported by Grayson and Olinger (1957), a 
sample of 15 schizophrenic women, none of 
whom had been included in the study re- 
ported here, were administered the entire 
MMPI under both standard and role taking 
instructions. An analysis of this small sample 
confirms the results found in the present 
study. No significant changes were found in 
the apparent ability of this small sample 
to reduce signs of psychopathology on the 
MMPI under role taking instructions. In a 
sample of 14 schizophrenic men tested with 
the MMPI under standard and role taking 
instructions, there was again no significant 
reduction in signs of psychopathology from 
the first to the second test administration on 
any MMPI variable. In comparing the sam- 
ple of cases used in the present study with 
the sample used by Grayson and Olinger, it 
seems possible that their sample consisted of 
cases who did not have as extensive psycho- 
pathology as the sample used in the present 
investigation. Furthermore, it is possible that 
there is a difference in general language fa- 
cility. These two differences may account for 
the different findings. The ability to take the 
normal role appears to be a variable trait 
which is partially related to severity and 
longevity of psychopathology and to subse- 
quent clinical history. Level of intelligence, 
however, especially language facility, may 
also be related to this trait. 


SUMMARY 


In the present study an attempt was made 
to throw further light on role playing in 
schizophrenia. On the basis of previous re- 
search, it was hypothesized that acutely ill 
schizophrenics would be better able to play 
the normal role than chronically ill ones, and 
that whether acutely or chronically ill, schizo- 
phrenics who subsequently improved would 
be better able to play the normal role than 
those who did not. Although the results were 
all in the predicted direction, they did not 
generally achieve high statistical significance. 
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OF PSYCHOTHERAPY 
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An assumption underlying most forms of 
psychotherapy is that the relationship be- 
tween the therapist and his patient is the ve- 
hicle for therapeutic change. More specifically, 
the benefits from therapy are believed to vary 
directly with the quality of the therapist-pa- 
tient relationship (Betz & Whitehorn, 1956; 
Freud, 1949; Rogers, 1951; Snyder, 1959). 
It is frequently assumed that the fact of a 
patient’s remaining in treatment may be in- 
terpreted as evidence of the “goodness” of 
the relationship and therefore of the prob- 
ability of an ultimately successful outcome. 

These widely held beliefs have not, how- 
ever, gone unchallenged. Eaton (1959) re- 
cently warned that a “good relationship” 


may indeed interfere with therapeutic out- 
come. He stated that a “good relationship 
may help influence the client to become de- 
pendent on such help and to continue seek- 


ing it,” thereby defeating the therapeutic 
goal of helping the client to achieve au- 
tonomy. Redl and Wineman (1951) also 
pointed out the potential limitations of a 
seemingly good relationship. They stressed 
that the therapist who establishes a close, 
warm, and permissive relationship with a pa- 
tient may find himself occupying the non- 
therapeutic role of “friend without influence.” 
These contradictory viewpoints may be due 
to the fact that the investigators used differ- 
ing definitions of the concept good relation- 
ship. What is required is a more explicit defi- 
nition of the therapist-patient relationship 
concept and the systematic testing of it. To 


1 This study was conducted at the Henry Phipps 
Psychiatric Clinic of the Johns Hopkins Hospital, 
Baltimore, Maryland. The author expresses grateful 
acknowledgment to J. D. Frank, E. Ascher, H. Kel- 
man, D. Rosenthal, E. Nash, and A. R. Stone for 
their cooperation and many helpful suggestions. 


date there have been very few experimental 
tests made regarding the association between 
favorable outcome and quality of the thera- 
pist-patient relationship (Heine, 1950; Holt 
& Luborsky, 1952; Snyder, 1953). 

There are many theoretical frames of ref- 
erence from which the concept of relation- 
ship may be viewed, yet, according to Fiedler 
(1950), these differences may readily be sub- 
sumed under one general description of the 
“ideal therapeutic relationship.” He reported 
that therapists of diverse theoretical persua- 
sions revealed a remarkable degree of agree- 
ment in characterizing the ideal therapeutic 
relationship. The very concept of the ideal 
therapeutic relationship appears however to 
violate the clinician’s belief that to be effec- 
tive a relationship must be adapted and 
modified to meet the particular needs of a 
given patient. An examination of the Fiedler 
instrument is reassuring since the descriptive 
statements are written at such a high level of 
abstraction as to encompass the relationship 
needs of most patients as well as most non- 
patients. This study, therefore, employs Fied- 
ler’s view of the therapeutic relationship to 
provide an operational definition of this elu- 
sive concept. 

This report describes an effort to test, in a 
group therapy setting, the correlations be- 
tween patient-change, remaining in treatment, 
and quality of therapeutic relationship. Since 
the definition of the construct “therapeutic 
relationship” has not been widely agreed 
upon, and no criteria have been accepted as 
valid, this study is to be viewed as a test of 
the concept’s construct validity (Cronbach & 
Meehl, 1955). It is further recognized that 
the construct validity of the outcome criteria 
employed here also may oe regarded as under 
study. 
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The concept of the “therapeutic process” 
presupposes that psychotherapy proceeds in 
a discernibly systematic step-wise fashion. 
Therefore, some investigators believe that 
certain kinds of intermediate change may be 
viewed as harbingers of significant benefit to 
the patient. The investigator who accepts this 
idea ascribes to a variety of phenomena the 
status of “enabling” or intermediate condi- 
tions necessary for beneficial change. Unfor- 
tunately, the relationship between the alleged 
roadmarkers and the destination is not as yet 
established. In group therapy, for example, it 
may be encouraging to the therapist to note 
increased group cohesiveness, evidences of 
group support and stimulation, establishment 
of multiple transferences, recall of repressed 
material, resolution of transferences, etc. That 
such phenomena do not invariably eventuate 
in clinical improvement will be conceded even 
by the most ardent group therapist. The au- 
thor decided, therefore, to concentrate on cri- 
teria that related to ultimate goals, such as 
providing symptomatic relief and improving 
social functioning, rather than intermediate 
goals. The assumption that “improvement” is 
a unitary phenomenon is questionable (Kel- 
man & Parloff, 1957). This is especially the 
case where improvement is “less than com- 
plete recovery.” This broad category unfor- 
tunately includes a considerable proportion 
of all patients treated. If, then, improvement 
cannot be discussed in global terms, it is 
necessary to specify the various criteria and 
measures. 

The three criteria of improvement adopted 
in this study are based on the work of Kogan 
and Hunt (1950) and Miller (1951). These 
criteria are: Comfort, Effectiveness, and Ob- 
jectivity. The first two criteria, Comfort and 
Effectiveness, are based on the belief that the 
general aim of psychotherapy is to ameliorate 
the patient’s suffering and to restore him to 
an effective level of functioning in the com- 
munity. Comfort was defined in terms of 
symptoms or feelings which had caused dis- 
tress. Effectiveness was defined as the degree 
of competence with which the patient man- 
aged to fulfill his own needs and desires as 
well as those of others. With the third cri- 
terion, Objectivity, an attempt was made to 
take cognizance of a value which is shared by 


a number of quite different psychotherapeutic 
approaches. All assert that the better the in- 
dividual understands himself the freer he will 
be to react appropriately to conditions arising 
in his current life. Objectivity is not an end- 
point per se but a generally accepted means 
to an end. 

That the patient remain in treatment is a 
necessary but not sufficient condition for psy- 
chotherapy to be effective. The amount of 
time necessary for change to occur varies 
from patient to patient. The patient may re- 
main in treatment and yet fail to improve. 
Although a patient who drops out of therapy 
may have derived considerable benefit, his 
departure may preclude any objective assess- 
ment of this benefit. The factors that deter- 
mine whether a patient will remain in treat- 
ment may or may not coincide with those 
which determine whether he will improve if 
he does remain. 

‘In the present study it was hypothesized 
that changes evidenced in Comfort, Effective- 
ness, and Objectivity are related to the qual- 
itv of the therapeutic relationship. It was 
also hypothesized that remaining in treatment 
is similarly a function of the goodness of the 
relationship. 


METHOD 


The eight instruments used in this study to meas- 
ure the criteria (Comfort, Effectiveness, and Objec- 
tivity) will be described only briefly. A fuller de- 
scription may be found elsewhere (Kelman & Parloff, 
1957) .2 Since the therapy goals to be reported were 
the amelioration of discomfort and the modification 
of ineffectual behavior, the scales measuring Comfort 
and Effectiveness were reversed to measure instead 
the degree of “Discomfort” and “Ineffectiveness.” 

Judgments regarding the patient’s Discomfort, In- 
effectiveness, and Objectivity were made by research 
teams composed of a psychiatrist, social worker, and 
psychologist. Specially designed “evaluation” scales 
were filled out independently by each member of 
the research team in describing each patient. These 
judges then met to discuss their ratings and to arrive 
at an overall single staff rating for the scale measur- 


2 Copies of measures used in the evaluation of psy- 
chotherapy have been deposited with the American 
Documentation Institute. Order Document No. 6464 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress; Washing- 
ton 25, D. C., remitting in advance $1.75 for micro- 
film or $2.50 for photocopies. Make checks payable 
to: Chief, Photoduplication Service, Library of Con- 
gress. 
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ing each criterion. In addition to these staff ratings, 
two measures of Discomfort were obtained from the 
patients, one measure of Ineffectiveness was derived 
from ratings made by group members of each other, 
and two measures of Objectivity were obtained by 
(a) comparing the patient’s self—description with an 
independent staff observer’s description of him, and 
(6) determining the accuracy of the patient’s predic- 
tions of the ratings he would receive from each of 
his fellow patients. The staff members who partici- 
pated in completing the staff evaluation scales did 
not act as judges of the therapeutic relationship for 
the same patients except in two cases 


Measures of Discomfort 


1. Self-Satisfaction Q sort (Patient): This at- 
tempted to tap the degree of congruence between 
the patient’s perceived self and ideal self regarding 
behavior in the group therapy situation. A 60-item 
“perception” Q sample was employed. It is based on 
Bion’s (1950) group interaction concepts. 

2. Symptom Disability Checklist (Patient): This is 
a modification of the Cornell index. Forty-one items 
referring to psychic or somatic complaints were rated 
by the patient in terms of the relative distress they 
caused him during the week preceding testing. 

3. Discomfort Evaluation Scale (Staff): This con- 
sists of items describing 10 areas of interpersonal dis- 
comfort. The scale was filled out independently for 
each patient by three staff raters: psychiatrist, social 
worker, and psychologist. These judges then met to 
discuss the ratings and to agree on an overall single 
staff rating on the basis of their combined clinical 
judgment. 


Measures of Ineffectiveness 


1. Ineffectiveness Evaluation Scale (Fellow Pa- 
tients): Each patient rated each of the other pa- 
tients in his therapy group on three dimensions: the 
extent to which the rater respected another patient’s 
ideas and opinions, regarded him as a group leader, 
and desired to be friends with him. Ratings on each 
dimension were made on a four-point scale and were 
reported individually as measures of Respect, Leader- 
ship, and Friendship. The average overall rating a 
patient received on the three measures was also com- 
puted. 

2. Ineffectiveness Evaluation Scale (Staff): This 
scale consists of 15 items in which the patient’s crea- 
tivity, productivity, and fulfillment of social roles 
are rated. The ratings concern the degree of appro- 
priateness of the behavior and the frequency with 
which it occurred in relation to significant persons 
in the patient’s home and community life. The above 
mentioned staff members independently completed 
this form. On the basis of a conference a unified staff 
rating was made. 


Measures of Objectivity 


1. Objectivity Q sort (Patient-Observer): Objec- 
tivity was measured by the degree of congruence be- 


tween the patient’s description of his group behavior 
and the staff observer’s description of the patient’s 
behavior in the group. The Q sort items were the 
same as those used for measuring self-satisfaction. 

Objectivity Evaluation Scale (Patient-Fellow Pa- 
tient): In completing the group questionnaire previ- 
ously described, patients were asked to predict the 
ratings which they would receive from each of their 
fellow group members. The average discrepancy be- 
tween the ratings each patient expected from each 
fellow patient and the ratings he actually received 
was computed for each of three areas: Respect 
Friendship, and Leadership. 

3, Objectivity Evaluation Scale (Staff): This scale 
consists of four items attempting to measure the ac- 
curacy of the patient’s perceptions of his own be- 
havior and the behavior of others. Staff ratings were 
mace independently and then combined into a single 
staff rating by the conference method. 

Except for the Symptom Disability Checklist, 
which was completed prior to therapy, all initial 
testings were made immediately after the fourth 
group session. All measures were, repeated following 
the twentieth group meeting. 


Drop-Outs 


Any patient who left the group prior to the twen- 
tieth session without his therapist’s approval was 
considered to have terminated prematurely. Four of 
the 21 patients were so designated. By the end of 
the experimental period each patient had attended 
an average of 9.6 sessions. The attendance ranged 
from 5 to 12 sessions.* 


Therapeutic Relationship 


The technique developed by Fiedler (1950) was 
employed. He had the 75 statements in his Relation- 
ship Q Sample sorted by members of various schools 
of therapy. On this basis, an ideal therapeutic rela- 
tionship standard was developed. Twenty-five items 
concerned the therapist’s ability to communicate with 
and to understand the patient, 25 described the “emo- 
tional distance” between the therapist and patient, 
ind the remaining 25 dealt with questions of “status” 
as reflected in the therapist’s behavior toward the 

3Prior to assignment to one of three therapy 
groups, each of the 21 patients had been “screened” 
by ‘exposure to a 6-week orientation group. Experi- 
ence with group therapy had indicated that more 


than one-third of the patients dropped out of therapy 


by the end of the fifth session. This involved a loss 
of the time invested in initial evaluations of such pa- 
tients. The aim of the orientation group was to ex- 
pose patients to a group experience similar to that 
which they might experience in the actual therapy 
situation. It was hoped that patients who survived 
six sessions might then tend to remain in group ther- 
apy. The selection process did, in fact, act to in- 
crease the proportion of patients remaining beyond 
the fifth hour in therapy. Only one dropped out of 
treatment by the fifth hour. 
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patient. In the present study, the 75 items were used 
by the observers to describe the relationship between 
therapist and patient.‘ These arrays were then cor- 
related against the Fiedler ideal therapeutic relation- 
ship standard. The higher the correlation with the 
standard, the “better” the relationship. Three trained 
observers were used as judges to describe the thera- 
peutic relationship established by two _ therapists. 
After a preliminary practice period, the interjudge 
reliability in describing the same therapist-patient 
interaction was found to be substantial. The correla- 
tion between the relationships described by pairs of 
judges in observing 19 patient-therapist interactions 
was .92.5 One of the three judges was assigned to 
each of the three groups and attended all group 
meetings during the experimental period of 20 weeks 
Each judge described the relationships established by 
the therapist with each patient in the group after 
the second meeting, the twelfth meeting, and the 
twentieth meeting. Each of the three descriptions 
was correlated against the standard and the overall 
relationship was described as the mean of the three 
correlations.® 

The sample consisted of 21 psychoneurotic pa- 
tients, 10 male and 11 female. Fourteen of the pa- 
tients were classified as “psychoneurotic disorders,” 
five as “personality disorders,” one as “psychotic dis- 
order,” and one as “transient situational personality 
disorder.” They were randomly assigned to three 
groups, ranging in size from six to eight. A treated 
13 patients, 6 in one group and 7 in the other. B 

4 Fiedler’s concept of the ideal therapeutic relation- 
ship was derived primarily from experiences in indi- 
vidual therapy. It was necessary, therefore, to deter- 
mine whether the group therapists in the present 
study conceived of the ideal therapeutic relationship 
in a similar fashion. Each therapist was asked to de- 
scribe, by means of the 75-item Q sort, his concep- 
tion of the ideal therapeutic relationship. These ar- 
rays were correlated with the Fiedler standard. It 
was found that A’s and B’s ideals correlated .86 and 
88, respectively, with the Fiedler “ideal.” It was 
concluded, therefore, that the aims of these group 
therapists were sufficiently consonant with the Fied- 
ler standard to permit its use as the criterion for 
measuring the goodness of relationships. 

5 Since the data collection involved the use of in- 
dependent group observers, the question of the inter- 
rater and intrarater reliability is an important one. 
Although the reliability measures described here ap- 
pear to be adequate, data are available only for the 
initial period of the study. Since no further attempt 
to check on the reliability of the judges was made 
during the period of the study, there is no direct 
evidence that judges continued to describe the thera- 
peutic relationship in a consistent manner through- 
out the experiment. Indirect evidence on this point is 
found in the fact that the patients’ descriptions of 
their relationships with their therapists correlated 
substantially with those ascribed to them by the 
judges (rho = .79). 

6 The necessary z transformations were made. 


treated one group of 8 patients. The data reported 
are based on the first 20 sessions of each group and 
are, therefore, limited to the early period of treat- 
ment. Of the initial 21 patients, 14 completed all ex- 
perimental procedures by the close of the 20-week 
period.? 

Each group met for an hour and a half once a 
week. The form of therapy was largely interpretive 
with the focus on the immediate interaction of the 
patients with each other and with the therapist. 

The two therapists qualified as “experts” as de- 
scribed by Fiedler, ie, each had completed pre- 
scribed training, had been a practicing therapist for 
a minimum of 5 years, and was considered an expert 
by other therapists within his school 


RESULTS 


To determine whether improvement varies 
with the quality of the therapeutic relation- 
ship, product-moment correlations were com- 
puted between the Fiedler ideal therapeutic 
relationship scores and each of the 14 change 
scores.* Inspection of the therapeutic relation- 
ship scores revealed that the four patients 
treated by B had each achieved relationships 
which were higher than any established by 


7In addition to the four drop-outs already men- 
tioned, one patient left treatment when her husband’s 
job was transferred out of the city, and one failed 
to complete all evaluation procedures. Another pa- 
tient was excluded from the study when it was 
learned that he had supplemented group therapy 
with intensive individual therapy. The effectiveness 
of the group therapy relationship was, therefore, 
confounded with the individual therapy relationship 

8 To determine whether the evaluation measures 
were initially related to the quality of therapeutic 
relationships subsequently established, correlations 
were computed between initial scores on each of the 
14 measures and the overall mean therapeutic rela- 
tionship scores. The correlations obtained did not 
differ significantly from zero. The range was from 
126 to —.452. (In order for a correlation with 12 
degrees of freedom to be significant at the .05 level 
of confidence, a correlation of .532 is required.) 

To further test whether the therapeutic relation- 
ship was associated with initial scores on these 
evaluation measures, the initial scores of the seven 
patients who had therapeutic relationships above the 
group median were compared with the seven pa- 
tients whose therapeutic. relationship scores fell be- 
low the median. None of the group differences as 
tested by the ¢ test attained statistical significance 
In view of the apparent lack of association between 
the 14 initial evaluation scores and the quality of 
the subsequent therapeutic relationships established, 
we were justified in computing correlations between 
therapeutic relationships and the difference scores be- 
tween the initial and ‘final evaluation scores. 
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the 10 patients treated by A. In effect, the 
therapeutic relationships established by each 
therapist with his patients appeared to come 
from different “populations” of relationships. 
This suggested that the influence of the thera- 
peutic relationship variable, which is at issue 
in this investigation, may be confounded with 
other variables related to the individual thera- 
pist. 

The correlations between therapeutic rela- 
tionship and the measures of change were 
therefore computed independently for A’s pa- 
tients and for B’s patients. To obtain the best 
estimate of the true mean correlations for the 
total sample a pooled correlation (7) was com- 
puted.* As may be seen in Table 1, 3 of the 
differed 


14 mean correlations significantly 


® To determine whether the patients of each thera- 
pist differed on their initial evaluation measure 
scores, 2 Mann-Whitney U test was computed for 
each of the 14 measures. No significant differences 
were found. 


TABLE 1 


POOLED MEAN CORRELATIONS BETWEEN CHANGE AND 
MEAN THERAPEUTIC RELATIONSHIP 
(Computed for 10 Patients Treated 

by A and 4 by B 








Pooled 7 
Measure V =14) 
Discomfort 
4. Self-Satisfaction O sort (Patient 037 
B. Symptom Disability Checklist 
(Patient 669** 
C. : Discomfort Evaluation Scale (Staff 012 
Ineffectiveness 
4. Ineffectiveness Evaluation Scale 
Fellow Patients) 
1. Respect 103 
2. Leader 613" 
3. Friend 17 
4. Overall Total 460 
B, Ineffectiveness Evaluation Scale 
(Staff) 133 
Objex tivity 
A. Objectivity O sort (Patient 
Observer) $21 
B. Objectivity Evaluation Scale 
Patient-Fellow Patient 
1. Respect 183 
2. Leader 161 
3. Friend 013 
4. Overall Total O82 
C, Objectivity Evaluation Scale (Staff 669** 
Note.—The direction of “negative” scales has been reversed 
so that a positive correlation between a criterion and relation 


ship indicates that the greater the relat 
mprovement; a negative correlation ind 
tionshif 


* 


nship the greater the 
icates a negative rela- 
between relationship and improvement 
r significant at the .05 level, one-tailed test 


** r significant at the .01 level, 


one-tailed test 


TABLE 2 


PATIENT—THERAPIST RELATIONSHIPS 
witH A AnD B 


(2) ESTABLISHED 


A B 
Group I Group II Group III 

51* 74 1.28 

.23 71 1.10 

15 63 1.06 
10 62 1.00% 

09 53 99 

04 24° 81° 

174 7¥ 

63° 

Mean 19 52 95 
SD 17 225 214 

N 6 7 8 


* Moved out of city 

» Failed to complete evaluation procedures. 

¢ Terminated prematurely 

1 Supplemented group therapy with simultaneous in 
therapy. 


lividual 


from zero.’° The findings indicate that the 
more closely the therapeutic relationship ap- 
proximated the ideal relationship the greater 
the increase in the patient’s Objectivity (as 
evaluated by the staff, 7 = .67, p < .01); the 
greater the increase in group Effectiveness— 
Leadership (as derived from ratings by fel- 
low patients, 7 = .61, p< .05); and the 
greater the relief from symptomatic Discom- 
fort (as reported by the patient, 7 = .67, p < 
.01). It is noted that the correlation between 
therapeutic relationship scores and Objectiv- 
ity as measured by the Q sort falls just short 
of reaching an acceptable level of significance 
(7 = .52, while an r of .53 is required for p 
< .05). 

These findings indicate that the quality of 
the therapeutic relationship does vary on cer- 
tain measures with patient-change in the 
areas of Objectivity, Effectiveness, and Com- 
fort when the therapist variable is controlled. 

That an association exists between the 
quality of the therapeutic relationship and 
the incidence of drop-outs is strongly sug- 
gested (see Table 2). When the eight pa- 
tients initially assigned to B’s group were 
ranked according to the quality of the thera- 
peutic relationship established, it was found 


10 Since the direction of the correlation was pre- 
dicted, a one-tailed test was applied 
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TABLE 3 
MEAN DIFFERENCE IN OVERALL 
Group RELATIONSHIPS 


Mean 


Differ- 


Comparison ence 


Group I vs. II 
Group I vs. III 
Group II vs. III 


that those patients who had remained in 
treatment occupied ranks 1 through 5. Those 
who terminated prematurely were in rank or- 
der positions 6 through 8. In investigating 
the rank order position of the single indi- 
vidual who dropped out of one of A’s groups, 
it was found that of a group of seven the 
terminator had occupied position Number 6. 
Moreover, the patient in rank order position 
Number 7 was found to have supplemented 
his group therapy experience with intensive 
individual psychotherapy without having noti- 
fied the group therapist. Thus, the five who 
dropped out or found it necessary to supple- 
ment group treatment appear to have had 
the poorer relationships when compared to 
the other members of the particular group to 
which they were assigned. An inspection of 
the group in which no member dropped out 
showed that the relationships were quite uni- 
form and, incidently, very low. The variance 
within this group tended to be smaller than 
in the other groups. The mean relationship in 
this group (Group I) was the lowest of the 
three groups (see Table 3). Thus the group 
having the poorest relationships remained in- 
tact while the group having the highest rela- 
tionships lost 38% of its members. From 
Table 2 it appears that the quality of the 
therapeutic relationship is related to prema- 
ture termination; however, the absolute size 
of the relationship score appears to be less 
important than the terminator’s relationship 
score relative to those of other members of 
his group. 

In designing the study it was decided to use 
observers to describe the relationships since it 
was assumed that patients’ perceptions of 
their relationships with the therapist would 
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be subject to the distorting influence of trans- 
ference. However, the fact that the patients’ 
premature termination of therapy appears to 
be related to the quality of relationships, as 
evaluated by observers, implies that the pa- 
tients experienced their relationships much as 
described by these judges. To investigate this 
apparent concordance, an attempt was made 
to determine the degree of agreement be- 
tween the observer’s and the patient’s de- 
scriptions of the relationships. At the end of 
the experimental period the 14 remaining pa- 
tients were asked to describe their relation- 
ship with the therapist by using the Fiedler 
O sort. Various items were modified to clarify 
technical terms. Patients were assured that 
the data they furnished was confidential and 
would not be relayed to their therapists. Their 
descriptions were then correlated with the 
ideal to derive “relationship scores.” Each 
patient’s relationship score was then corre- 
lated with his mean relationship score as pro- 
vided by the observer. The rho correlation 
was found to be .79 (p < .01). This finding 
strongly suggests that in these groups pa- 
tients were quite objective in perceiving their 
relationships with their therapists. Transfer- 
ence distortions as measured here appear to 
have played a surprisingly small role. 

Although the focus of the paper has been 
on the effect of the patient-therapist relation- 
ship on the outcome of group therapy, it is 
recognized that the patient-patient relation- 
ships also play a significant role. In the sam- 
ple of patients studied it was found that those 
who established better relationships with their 
therapist reported a significantly greater in- 
clination to perceive other group members 
as being socially attractive (Ineffectiveness 
Evaluation Scale—-Friendship) than those who 
formed poorer relationships with their thera- 
pist. 


DISCUSSION 


The findings of this study appear to pro- 
vide limited support for the hypothesis that 
patient-change in psychotherapy is related to 
the quality of the therapeutic relationship es- 
tablished. Of a total of 14 change measures, 
3 revealed significant correlations with thera- 
peutic relationship scores. It is of interest 
that one measure of each criterion of im- 
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provement—Discomfort, Ineffectiveness, and 
Objectivity—showed this association. The 
data indicate that the better the patient- 
psychotherapist relationship, the greater the 
symptomatic relief experienced by the pa- 
tient, the more likely it was that fellow group 
members would describe the patient as having 
become more dominant (leader), and the 
greater the increase in Objectivity attributed 
to the patient by the research staff. Since the 
treatment period included in this study was 
arbitrarily limited to the initial 20 weeks, it 
is not clear whether the associations de- 
scribed here characterize only the early stages 
of psychotherapy. It is not known whether 
these correlations are maintained, diminished, 
or increased, or whether additional correla- 
tions will be found between other outcome 
measures and the therapeutic relationship 
scores in later periods of psychotherapy. It 
is possible that not all the behaviors meas- 
ured in this study are modifiable at the same 
rate. The probability that a significant cor- 
relation would be found with the therapeutic 
relationship scores was not necessarily equal 
for each of the 14 measures. However, the re- 
searcher did assume that each measure had 


face validity and that therefore the hypothe- 
sis could reasonably be tested against each of 
these 14 measures.** 


11 Experience with the Objectivity Evaluation 
Scale (Patient-Fellow Patients) indicates that it re- 
quired the raters of the four measures contained in 
this scale to make rather complex judgments. The 
ostensible task for the patient was to predict the 
rating he would receive from each patient on each 
of three measures: Respect, Leadership, and Friend- 
ship. In order to evidence Objectivity he had to be 
able to predict not only how he was perceived by 
each of his fellow patients but also the rating which 
each would be willing to assign to him. The rater is 
frequently keenly aware that the scores which he 
assigns to others will also provide the investigator 
with information about the rater. Therefore, the 
ratings are frequently intended primarily as a com- 
munication to the examiner rather than as an ob- 
jective report of the raters’ experiences and feelings 
about fellow patients. Some patients wish to be seen 
as warm and friendly and therefore assign high 
scores fairly indiscriminately to their fellow pa- 
tients. Others wish to be seen as aloof and independ- 
ent of the group members and therefore assign low 
scores. In addition, some patients appear reluctant to 
reveal warm feelings concerning group members of 
the opposite sex and therefore tend to minimize these 
ratings. 


Statistically significant changes for the 
group as a whole, over the 20-week period, 
occurred only on three measures: Symptom 
Disability Checklist, Discomfort Evaluation 
Scale (Staff), and Ineffectiveness Evaluation 
Scale (Staff). Only one of these change meas- 
ures was found to be significantly correlated 


- with therapeutic relationship scores. Although 


it is possible that individual patients experi- 
enced real changes even if the overall sam- 
ple did not, it is equally possible that the 
change scores used in this study may repre- 
sent measurement error, particularly in those 
instances where a lack of correlation with 
therapeutic relationship scores was shown. In 
those instances where no significant change 
occurred, the possibility of measurement error 
must be considered a possible explanation for 
the lack of correlation between change scores 
and therapeutic relationship scores. 

The question arises as to the explanation 
of the finding that one measure of each cri- 
terion showed a significant association with 
the quality of the therapeutic relationship, but 
other measures of the same criterion failed to 
show similar associations. One condition un- 
der which such findings could have occurred 
is that each of the three hypothesis-support- 
ing measures was independent of the other 
measures of a given criterion. To the degree 
that the various measures are independent of 
one another they may be expected to vary in- 
dependently with the quality of the thera- 
peutic relationship. A second condition which 
might account for these findings is that the 
measures were related to each other in a com- 
plex fashion—i.e., two measures may be cor- 
related under some circumstances, and not 
correlated, or even negatively correlated, un- 
der other circumstances. To determine the de- 
gree and nature of the association between 
the three criterion-supporting measures and 
the other measures of the relevant criterion, 
the initial scores on these measures were in- 
tercorrelated. Thus scores on the Symptom 
Disability Checklist were correlated with the 
other two Discomfort measures, Objectivity 
Evaluation Scale (Staff) scores were corre- 
lated with the other five Objectivity meas- 
ures, and Leader scores on the Ineffectiveness 
Evaluation Scale were correlated with the 
other four Ineffectiveness measures. The con- 
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dition of independence of measures was found 
to be the case with the Objectivity group of 
measures. No significant correlations were 
found between the initial scores of the Ob- 
jectivity Evaluation Scale (Staff) and the re- 
maining Objectivity measures. Since the meas- 
ures apparently tap different aspects of the 
criterion, the fact that change on one meas- 
ure correlates with relationship scores does 
not lead to the expectation that change on 
other measures will also be associated with 
relationship scores. 

The possibility that some contamination of 
measures had occurred on the Objectivity 
Evaluation Scale (Staff) must be considered 
since in two instances one member of the 
three-man team of staff raters had also been 
the observer who described the therapeutic 
relationships. This fact need not cast serious 
doubt on the authenticity of the correlation 
between changes on Objectivity Evaluation 
Scale (Staff) and quality of therapeutic re- 
lationship for the following three reasons: 
(a) Staff ratings were determined by confer- 
ences of three staff members. There is no 
evidence that the opinion of any one team 
member was weighted disproportionately. (5) 
Even under the most unfavorable circum- 
stances only 2 of the 14 cases could have 
been affected by the possible contamination 
of measures. (c) If the staff ratings and the 
judgments regarding therapeutic relationships 
had been spuriously correlated due to con- 
tamination of measures, the same association 
would also be anticipated with the other two 
staff ratings on Ineffectiveness and Discom- 
fort. No such evidence is found. 

The condition of “complexity of associa- 
tion” between measures appears to apply to 
the cases of Ineffectiveness and Objectivity 
measures. 

In the case of Leadership, Ineffectiveness 
Evaluation Scale (Fellow Patients), it was 
found that it failed to correlate significantly 
with any other measure of Ineffectiveness with 
the exception of the Overall Total score. This 
is to be expected since the Overall Total 
measure is simply the sum of Leadership, 
Respect, and Friendship scores. Leadership 
scores, however, were not found to correlate 
with either Respect or Friendship. Appar- 
ently a patient’s dominant group behavior 


did not win him the respect or friendship of 
his peers. Since the Overall Total Evaluation 
Scale score is made up of components which 
are independent of each other, it is not sur- 
prising that changes on this measure do not 
correlate with therapeutic relationship scores, 
despite the fact that Leadership change scores 
do show a significant association with thera- 
peutic relationship scores. 

The Symptom Disability Checklist, a meas- 
ure of Discomfort, was found to correlate sig- 
nificantly (.55) with Self-Satisfaction Q sort. 
The expectation that changes in Self-Satisfac- 
tion Q sort scores might, like the Symptom 
Disability Checklist change scores, correlate 
with therapeutic relationship scores was not 
supported. An analysis of the Self-Satisfac- 
tion Q sort data revealed that two quite op- 
posite changes had occurred. Those patients 
who initially reported very high self-satisfac- 
tion appeared to become less content with 
themselves, while those who initially showed 
the greatest discontent tended to show greater 
self-satisfaction as therapy progressed. This is 
not a simple regression toward the mean for 
it was found that the patients who initially 
indicated that their group: behavior very 
closely approximated their ideal behavior 
were the ones who showed rather poor Ob- 
jectivity. As these individuals became increas- 
ingly aware of their actual behavior, this was 
reflected in their description of their group 
behavior as being less in accord with their 
rather stable ideals. As a result of this shift 
the initial correlation between self-satisfac- 
tion and symptomatic comfort was dissipated. 

The fact that a positive relationship was 
found between the quality of therapeutic re- 
lationship and three measures does not, of 
course, permit one to assign direction to this 
association. In the case of Leadership Ineffec- 
tiveness Evaluation Scale, for example, two 
equally plausible interpretations can be made: 
patients who established the better relation- 
ships with their therapists were able to be- 
come more assertive and dominant in their 
therapy groups; or, the therapist tended to 
relate to those patients who manifested in- 
creasing evidences of leadership and domi- 
nance in the group. Despite the fact that the 
findings are consistent with the general hy- 
pothesis that improvement follows upon the 
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establishment of a good relationship, other 
alternative interpretations must be consid- 
ered. The possibility that the relationship es- 
tablished may be associated with character- 
istics of the patient is not ruled out by the 
absence of a significant correlation between 
initial scores on the 14 measures used in this 
study and the quality of the subsequent thera- 
peutic relationship. On the contrary, there is 
compelling evidence that the therapist’s per- 
ception of his patients is intimately associ- 
ated with the quality of the relationship he is 
able to establish with them (Parloff, 1956). 
Such correlation was evidenced not only in 
the initial 4 weeks of therapy, but through- 
out the 20-week experimental period (Parloff, 
1953). 

It may be possible, for example, that one 
of the characteristics of the expert group 
therapist is his ability to identify the thera- 
peutic potential of his patients. As a conse- 
quence, he may then direct his attention to- 
ward effecting a positive relationship with 
those patients with whom he feels he can be 
most useful. Indeed, he may recognize and 
attempt to relate to those individuals who 
tend to improve seemingly independent of 
specific therapeutic efforts. 

Although the findings support the notion 
that Fiedler’s instrument has a measure of 
construct validity, the Q sort may offer only 
a partial or ever a superficial definition of the 
therapeutic relationship. The ideal therapeu- 
tic relationship standard defined by Fiedler 
seems to describe conditions essential to any 
meaningful social relationship, independent of 
therapeutic intent. This type of relationship 
may be therapeutic per se; it also is possible 
that this relationship may be prinicipally a 
prerequisite condition for the establishment 
of an as yet undefined and unmeasured thera- 
peutic relationship. Such a relationship may 
involve the utilization of specialized tech- 
niques and procedures which the therapist 
may regard as essential for treatment, e.g., 
analysis of transference, free association, 
dream interpretation, reliving of earlier emo- 
tional experiences, etc. One interpretation of 
the finding that patients who establish the 
better relationships with their therapists tend 
to perceive others as being more socially de- 
sirable may be that patients in a group take 


their cue in relating to each other from the 
quality of the therapist’s relationships with 
them. A more parsimonious explanation is 
that both therapist and patients react simi- 
larly to a given situation. 


SUMMARY 


This study reports an attempt to determine 
whether an association exists between the con- 
struct “therapeutic relationship” and outcome 
of treatment in a group therapy setting. The 
quality of the therapeutic relationship was 
measured by Fiedler’s ideal therapeutic rela- 
tionship Q sort. Three criteria of improve- 
ment were used: Comfort, Effectiveness, and 
Objectivity. These criteria were measured by 
14 scales. In addition, a study was made of 
the therapist-patiernt relationships established 
with patients who terminated therapy pre- 
maturely. 

The sample included 21 patients, 13 of 
whom were treated by one group therapist 
and 8 by another. The experimental treat- 
ment period was limited to 20 weeks, at 
which time outcome data were available on 
14 patients. 

Patients who established better relation- 
ships with their therapist tended to show 
greater improvement than those whose rela- 
tionships with the same therapist were not as 
good. In computing the overall pooled mean 
correlations between the therapeutic relation- 
ship scores and change measures for 14 pa- 
tients, significant correlations were found on 
three measures: increased Objectivity (staff 
evaluation), increased Effectiveness (group 
leadership), and increased self-ratings of 
Comfort (symptomatic relief). 

Premature termination of therapy by a pa- 
tient appears to be related to his perception 
of the “goodness” of the relationship he has 
established with his therapist relative to the 
general level of patient-therapist relationships 
within his group. Individuals having the 
poorer relationships in a group tended to 
drop out of therapy irrespective of the ab- 
solute goodness of their therapautic relation- 
ship. 

The hypotheses postulating that benefit 
from psychotherapy and incidence of pre- 
mature termination were associated with the 
goodness of the individual patient-therapist 
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relationship tended to be supported. These 
findings give limited support to the validity 
of the concept “therapeutic relationship” as 
defined by Fiedler. 
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A FACTOR ANALYSIS OF GERIATRIC ATTITUDES 


WILSON H. GUERTIN? 


University of Florida 


Despite current expressions of interest in 
geriatric patients, there is a surprising lack 
of attempts to evaluate the attitudes of the 
aged objectively. The only specific instrument 
available is the Activities and Attitudes Sur- 
vey of Caven, Burgess, Havighurst, and Gold- 
hamer (1949). Nor have there been factor 
analytic attempts to define the prominent di- 
mensions underlying the expressed attitudes 
of these people. The present paper is a re- 
port of a factor analysis of such attitudes 
and serves as a basis for the construction 
of a forced-choice Geriatric Attitude Scale 
(Guertin & Krugman, 1961). 


PROCEDURE 


A total of 166 “agree-disagree” items were com- 
posed to provide a wide sampling of the attitudes of 
institutionalized aged. Attitudes relating to prob- 
lems of adjustment were emphasized, with a few 
psychiatric and physical disability items included 

Forty-eight male residents of the Veterans Ad- 
ministration Center at Martinsburg, West Virginia, 
satisfactorily completed all the items. These subjects 
were all 60 or more yedrs old, and none carried psy- 
chiatric diagnoses. 

After excluding 16 items because responses to them 
were too uniformly in one direction, the remaining 
150 items were subjected to a rough linkage analy- 
sis. Key sort cards provided frequencies for a four- 
fold contingency table for each possible pairing of 
items. A link was noted when the tetrachoric correla- 
tion was greater than 40 between the items. These 
linkages and the direction of the relationships were 
transferred to slips which were laid out on the floor 
in the form of an intercorrelation matrix. Cluster 
items were identified by observing that half or more 
of the item linkages (row entries) in a column were 
the same for a pair of test items. 


1 Arnold D. Krugman collaborated with the author 
in devising the items and supplied the data used 
herein. The study was conducted while the author 
was employed at the Veterans Administration Hos- 
pital, Knoxville, Iowa. 


RESULTS 


Eight important clusters were identifiable 
but the complexity of interrelations made it 
clear that only factor analysis could clarify 
the underlying structure. Therefore, 27 promi- 
nent cluster items were selected to provide a 
matrix of tetrachoric interrelations. Multiple- 
group factor analysis and blind rotation to 
oblique simple structure by the single plane 
method produced the factor matrix in Table 1. 
Items employed in the factor analysis are 
identified by asterisks, but the calculation of 
additional item loadings was necessary to 
provide the content for an understanding of 
the nature of the obtained factors. 

Items in Table 1 without asterisks are those 
not originally factor analyzed, but for which 
loadings were calculated by extending the 
factor matrix (Cattell, 1952, p. 406). Only 
those items with heaviest loadings are re- 
ported here. Item descriptions are in abbrevi- 
ated form and decimal points have been 
omitted for convenience in presentation. 

It may be of interest to follow the fate of 
the eight clusters of variables. Three defined 
three of the multiple-group factors with a 
single variable from a fourth cluster pulled 
into one of the factors. Another cluster split 
to form two factors. The three remaining 
clusters failed to contribute uniquely to the 
factor structure. 

Conventionally, factors obtained from the 
multiple-group factor analysis are rotated to 
orthogonal positions to make it possible to 
calculate communality and residuals. The 
orthogonal matrix then is rerotated blindly to 
oblique simple structure. Since the first ma- 
trix obtained in factoring often approximates 
the final oblique simple structure solution, it 
is of some interest to compare the two ma- 
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trices. The following are five items in the 
original factor analysis, selected from Table 1 
as having the single highest loading in each 
of the factors. The values in the first column 
of the pair are for the multiple-group load- 
ings while the second are for loadings from 
the final rotated matrix. 

as 

46 

20 


35 
16 
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Basic and prominent general attitudes un- 

the 27 X 27 intercorrelation 
matrix as testified by the very high 72% of 
the total variance accounted for by the five 


derlie original 


factors. The total estimated communality was 
19.12, which was completely accounted for 
by the five factors. Communalities, calculated 
from orthogonal loadings, are listed in Table 
1. Intercorrelation between factors is gener- 
ally high as seen in Table 2 


TABLE 1 


ROTATED 


(nx. 
Dysph. 


Item Description 


Always fearful* 81 
Quite unhappy 81 
Need much sleep* 

Sick frequently* 

Feel unloved* 

Restless sleep 

Lonely 

Can’t keep mind on things 
Unwanted 

Best time of life is now 

Worry too much* 

Have high blood pressure* 

If there’s a Heaven I'll go* 
Best time of life is when child* 
Don’t care what happens to me* 
Family doesn’t care about me* 
No self-respect left* 

Don’t care to see relatives 
Nothing left to live for* 

Don’t believe in God* 

Nothing interests me anymore 
Don’t eat enough 

Don’t think will live a year 
Don’t like to go visiting* 


Have lots of friends 

Very religious* 

Money root of evil 

Wish had more freedom* 

Wars root of all trouble 
Mentally younger than appearance 
Older cast off by younger 
Trouble walking* 

Life has been tragic 

Wish had better clothes* 
People inconsiderate of others* 


Alien. 


OBLIQUE Factor LOADINGS 


Factors 


Phys 
Compl. Incap 


26 43 
4) 
32 
18 
49 
92 
22 
60 
24 
45 
67 
54 
—()2 
—04 


62 
22 
65 
49 
41 


60 
54 


58 
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rABLE 1 


Item Des« ription 


No sex interests 
Nothing left to live 
Bad headaches frequent* 
Fair to retire people at 65 
Stomach trouble 

Worry about health* 
Dizzy spells 

Feel helpless 


me 


lor 
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Often very unhappy 
Wish had better educatior 


tt 


Part of body paralyzed* 

Way of living very unnatural* 
Women ruined my life 

Drs. & nurses don’t care about us 
Have not lived good life* 

Get annoyed easily 

Future holds nothing* 

Young can’t be bothered with old 
People make own troubles* 
Death a relief from suffering* 


TABLE 2 


INTERCORRELATIONS BETWEEN 
FACTORS 


OBLIQt 


DISCUSSION 

The Anxiety-Dysphoria factor combines the 
feelings of fear, tension, and being unwanted. 
Preoccupations with health and social situa- 
tion reflect personal instability and general un- 
easiness. Since the manifestations are largely 
subjective, a superficially satisfactory adjust- 
ment is not precluded by their presence. How- 
ever, the underlying lack of self-confidence 
and general insecurity represent a deficiency 
in an attribute necessary for flexible adapta- 
tion to environmental change. 


ai 


Continued 


Factors 


Phys 


Comp! 


90 
84 
82 
sv) 
76 
73 
70 
69 
68 
62 


The Alienation factor demonstrates hostil- 
ity and dysphoria as a reaction to feeling re- 
jected. It is a disgruntled reaction reflecting 
ambivalence toward dependency. While love 
and help are desired strongly, these needs are 
vehemently denied. Hostility, which drives 
others away, serves as a defense against suc- 
corance by them. These attitudes probably 
find ready expression in response to efforts of 
others to establish independence in the aged. 

The Preserved Interest factor represents a 
relatively high level of interest in self and en- 
vironment. While there may be some narrow- 
ing of interests with aging, and certainly re- 
duced activity, the factor reflects strength 
and scope of interest as a resource. This char- 
acteristic permits a high level of social ac- 
tivity which may lead to the rewards of be- 
ing well liked. Triteness and superficiality en- 
ter into determining this factor so that while 
possession of the characteristics of this factor 
may be essential to being interesting and well 
liked, garrulousness 
effect. 


may have an adverse 
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The Physical Complaints factor represents 
a focusing of interest on the self in terms of 
body functions. Since there is no control for 
physiologically based illness built into the 
study, we must assume a variety of rea- 
sons for the complaints. They may be based 
upon systemic malfunctioning and anatomical 
changes associated with aging, chronic or 
acute disease, or may represent a hypochon- 
driacal exaggeration. 

The /ncapacitation factor is based upon 
crippling physical disease and the reaction to 
it. The significance of some of the heavily 
loaded items is not apparent and may be 
rather specific for the sample of subjects em- 
ployed. However, it is a sizeable factor and 
cannot be disregarded. 


Wilson H. 


Guertin 


SUMMARY 


Geriatric attitudes of 48 elderly residents 
of a veterans administration center were sam- 
pled. Analysis revealed five important atti- 
tudinal factors: Anxiety-Dysphoria, Aliena- 
tion, Preserved Interest, Physical Complaints, 
and Incapacitation. 
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JUDGMENTS AND THE DRAW-A-PERSON TEST 
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Southern Methodist University 


The Draw-A-Person Test (DAP), as de- 
veloped by Machover (1949) and Good- 
enough (1926), has become an important 


part of the clinical psychologist’s battery of 
assessment techniques. While widely used to 
provide information regarding the intellectual 
functioning and emotional and social behav- 
ior of a person, there is a paucity of ade- 
quate data regarding its empirical validity. 
Swenson (1957), in an extensive survey of 
the literature regarding human figure draw- 
ings, found modest confirmation of some 
of Machover’s hypotheses regarding group 
trends, but little evidence of the value of the 
DAP for individual diagnosis. The evidence 
would also indicate that the validity of the 
method for determining the level of a per 
son’s intellectual functioning is better estab- 
lished than is its validity for predicting the 
more complex and less well defined patterns 
of social and emotional behavior. An addi- 
tional deficiency in the existing data is that 
these studies which have attempted to deter- 
mine the validity of the DAP for predicting 
social behavior have tended to use the pooled 
judgments of clinicians rather than investi- 
gating the problem of the individual clini- 
cian’s contribution to the resultant predic- 
tion, e.g., Tolor and Tolor (1955). 

This study represents an attempt to pro- 
vide additional data regarding the validity of 
the DAP with regard to predicting intellec- 
tual, social, and emotional criteria, and also 
to provide data regarding the ability of indi- 
vidual clinicians to make such predictions. 


METHOD 


Subjects were 60 fourth grade school children, each 
of whom furnished a drawing of a person of each 
sex, done according to the usual DAP procedures. 
The judges were three clinical psychologists, each of 
whom was experienced in the use of the DAP and 
regarded as professionally competent. The judges 








were asked to rate the 
Intelligence 


drawings for the traits of 
Sociability, and Emotional Maturity. 
The ratings were done using a nine-point scale. The 
only information available to the judges was the age 
and sex of the child, which of the two drawings was 
completed first, and a detailed description of the test- 
ing procedure and criterion definitions for the three 
traits. 

The criterion for Intelligence was the child’s score 
on the Otis Quick-Scoring Mental Ability Test (Beta, 
Form EM), the criterion for Sociability was the 
preference rating given to the child by his fellow 
students on a sociogram, and the Emotional Ma- 
turity of the student was judged from a teacher's 
rating in which each teacher nominated the five most 
well-adjusted and the five most seriously emotion- 
ally disturbed boys and girls in her room. The 
teacher nomination form developed by Smith (1958) 
was used. On the basis of this tecl.aique the subjects 
were divided into groups of poor, average, and above 
average adjustment. 


RESULTS 


Distributions of the ratings for each of the 
traits by each of the judges were examined 
and found to be generally normally dis- 
tributed, as were the criterion scores. The as- 
sumptions for computing the Pearson product- 
moment correlation were met, and this method 
of statistical analysis was used.* As the cri- 


1 All statistical computations were done with the 
assistance of the Southern Methodist University 
Computing Laboratory on the Univac 1103. 


TABLE 1 


CORRELATIONS BETWEEN JUDGES’ RATINGS 
INDEPENDENT MEASUREMENTS 


AND 


Emotional 


Intelligence Sociability Adjustment 
Judge A 253* 158 —.142 
Judge B 496"* 006 151 
Judge .583** 179 -165 
Average for 
Judges A, B, ¢ 455** 115 1.53 





*p < .05 
> < 01 
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TABLE 2 


CORRELATIONS BETWEEN JUDGMENTS 
AND JUDGES 





Judge 
Traits A B ( 
Intelligence-Sociability 176 .337** .609** 
Intelligence-Emotional Adjustment 381** 447** 668** 
Sociability-Emotional Adjustment 082 872** 884** 
Average Intercorrelations 220 .615**  .750** 


% < 01. 


terion for emotional adjustment was essen- 
tially trichotomous, all correlations regarding 
this trait were corrected for coarseness of 
grouping. 

Table 1 gives the correlations between the 
judges’ ratings and the criterion for each of 
the three traits, as well as the average judge 
intercorrelations for each trait.2 Table 2 gives 
the intercorrelations between the ratings for 
the three traits by each of the judges, as well 
as the average trait intercorrelation for each 
judge. Table 3 gives the intercorrelations be- 
tween the three judges for each of the three 
traits as well as the average intercorrelations 
of the judges for each of the traits. 

DISCUSSION 

As is evident in Table 1, the three judges 
were able to predict the intelligence test per- 
formance of the subjects to a significant de- 
gree. However, there is a considerable differ- 
ence among the judges in their ability to 
predict this criterion. For example, a test of 
the significance of a difference between cor- 
relation coefficients indicated that Judge C 


2 All correlations were converted to Fisher z co- 
efficients for averaging and then converted back to 
correlation coefficients. 


is significantly better able to predict this trait 
than is Judge A. None of the judges is able 
to predict the Sociability criterion or the Emo- 
tional Adjustment criterion to a significant 
degree. Even more distressing is Judge A’s 
prediction of Emotional Adjustment which is 
negatively correlated with the criterion. Ef- 
forts to obtain multiple correlations between 
optimally combined judges’ ratings and the 
trait criteria resulted in multiple Rs of .614, 
.230, and .293 for Intelligence, Sociability, 
and Emotional Adjustment, respectively. We 
must conclude that even optimum weighting 
of the clinician’s judgments does not produce 
significant prediction of the latter two criteria. 

In the design of the study an effort was 
made to choose criterion areas which would 
be distinguishable from each other, and whose 
specific criterion scores would.tend to be in- 
dependent of each other. The intercorrelations 
of the criteria furnish some data bearing on 
the extent to which this effort was successful. 
The correlation between the intelligence scores 
and the sociogram scores was .414; between 
intelligence scores and the teacher ratings on 
Emotional Adjustment, .495; and between the 
sociogram scores and the teachers’ ratings of 
Emotional Adjustment, .458. Each of these 
correlations is significant beyond the .01 level. 
It can be concluded that the criteria were not 
completely independent, but were only mod- 
erately so. 

From Table 1 it can be concluded that 
Judge C is the best judge of the criteria, while 
Judge A seems to be the poorest overall. This 
difference in the ability to predict the cri- 
teria seems to be due to the joint influence 
of a judge’s ability to predict a subject’s in- 
telligence test performance and to the extent 


TABLE 3 


CORRELATIONS BETWEEN JUDGES 


Average 


Emotional Correlation 


Judges Intelligence Sociability Adjustment for Judges 
A-B ast .243 .307* 390 
A-C se 21° 349** 445 
B-C 702** 495** 504** 657 

Average on : 
Variables .615** 350** 3900** 


*p < .02. 
p< 01 
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to which a given judge implicitly viewed in- 
telligence as a trait related to Sociability and 
Emotional Adjustment. As indicated above, 
both the Sociability and Emotional Adjust- 
ment criteria were correlated with the Intelli- 
gence criterion. Partialing out the effect of 
the Intelligence criterion reduced the correla- 
tion between Sociability and Emotional Ad- 
justment to .320, indicating a fairly large con- 
tribution to the zero order correlation be- 
tween these two criteria. Table 2 shows quite 
clearly that Judge C produced trait ratings 
that were highly correlated with each other, 
while Judge A’s trait ratings showed a signifi- 
cant correlation only in the case of the In- 
telligence-Emotional Adjustment traits. Since 
the Intelligence criterion contributes heavily 
to the other criteria, and since Judge C is not 
only the best predictor of this criterion, but 
also the judge who shows the strongest tend- 
ency to use ratings on this dimension as an 
element in predicting the other criteria, it 
would follow that Judge C would appear to 
be the most accurate judge overall. The con- 
verse of this argument would hold for Judge 
\, and the same argument would support the 
appearance of Judge B as the judge with an 
intermediate degree of overall predictive abil- 
ity. It would appear that the impression of 
overall predictive accuracy conveyed by a 
given clinician is, in this case at least, a 
function of the clinician’s ability to develop 
a valid index of Intelligence from the DAP 
and little more. 

Table 3 reflects the extent to which the 
judges agree with each other in their ratings 
of the three traits. As would be expected from 
the other tables, the judges show their best 
agreement when it comes to predicting In- 
telligence from the DAP, and much less agree- 
ment when it comes to predicting the other 
two criteria. The higher correlations between 
Judges B and C can be explained by the 
greater extent to which the judges appear to 
be reacting to a general factor in their trait 
definitions, and would indicate that these gen- 
eral factors are similar for the two judges. 
The present study was not designed to tell us 
just what this general factor may be. It may 
be simply a liking for some drawings over 
others, a reliance on certain common refer- 
ences in the literature, a carry-over from 
their general impression of the person’s ad- 


wn 


justment, or it may be some halo effect de- 
rived from a source in the drawings as yet 
unknown. 

The findings of this study would not, in 
general, support the use of the DAP as a 
measure for predicting behavior criteria in 
the area of Sociability or Emotional Adjust- 
ment. The findings would lend support to the 
use of the DAP as a measure of intellectual 
functioning, but would also support the earlier 
reviews which suggest that the relationships 
are not adequate for individual prediction. 


SUMMARY 


The present study was designed to provide 
additional data regarding the ability of clini- 
cal psychologists to predict criteria of intelli- 
gence, sociability, and emotional adjustment 
from human figure drawings. The subjects 
were 60 fourth grade school children who 
were given the Draw-A-Person Test in a 
group situation. Three clinical psychologists 
judged the extent to which each child’s draw- 
ings indicate the existence of one of the three 
trait criteria. The relations of the clinical 
judgments to the criteria were statistically 
compared. 

The psychologists were able to predict in- 
telligence to a statistically significant degree, 
but were unable to predict either sociability 
or emotional adjustment. Although working 
independently, the judges did show a signifi- 
cant amount of correlation with each other 
in their predictions. 

Factors influencing the ability of the judges 
to produce ratings that would correlate with 
the criteria are discussed. 


REFERENCES 
GoopenoucH, FLorence L 
gence by drawings. 
Book, 1926. 
Macnover, Karen. Personality 
drawings of a human figure. 
Charles C Thomas, 1949. 
Situ, L. M. The concurrent validity of six person- 
ality and adjustment tests for children. Psychol 
Monogr., 1958, 72(4, Whole No. 457) 
Swenson, C. H., Jr. Empirical evaluations of human 
figure drawings. Psychol. Bull., 1957, 54, 431-466 
Totor, A., & Totor, Bette. Judgment of children’s 
popularity from their figure J. proj 
Tech., 1955, 19, 170-176. 


Measurement of intelli- 


Yonkers-on-Hudson: World 


projection in the 
Springfield, Ill 


drawings 


(Received January 1960) 








Journal of Consulting Psychology 
1961, Vol. 25, No. 1, 46-5 


AN APPLICATION OF 


PREDICTION TABLES TO THE 


STUDY OF DELINQUENCY* 


PETER F. BRIGGS, ROBERT D. WIRT, ano ROCHELLE JOHNSON 


University of Minnesota 


The rate of occurrence of a characteristic in 
a specified population has come to be called 
the base rate of the characteristic. Meehl and 
Rosen (1955) have discussed the importance 
of considering base rates in evaluating a pre- 
dictive system. They point out that “a psy- 
chometric device, to be efficient, must make 
possible a greater number of correct decisions 
than could be made in terms of the base rates 
alone”? (p. 194). An illustration used by 
these authors was the prediction of juvenile 
delinquency by the Gluecks (Glueck & 
Glueck, 1950). In their example where the 
base rate concept had been ignored, the data 
were in effect treated as though the base rate 
were 50%, which is highly unlikely. The 
present authors have cross-validated the same 
predictors and the conclusions drawn by Meehl 
and Rosen were born out (Wirt & Briggs, 
1960). 

Further examples of the importance of the 
base rate can be found with ease. An inter- 
esting recent article by Schofield and Balian 
(1959) compares the incidence of psychic 
trauma among normal and schizophrenic pa- 
tients. Their results indicate that the base 
rate for trauma is so high that it is obviously 
not peculiar to their schizophrenic sample. 
This study followed the form suggested by 
Pearson and Kley (1957) who remark: 


Eventually, like it or no, we will have to come to 
grips with the high probability that the base rate 

1 This research was supported in part by Grant 
1151c from the National Institute of Mental Health, 
Public Health Service, United States Department of 
Health, Education and Welfare; and in part by the 
Graduate School of the University of Minnesota. 
The authors wish to express their appreciation to 
the federal government and to their university. 

2 There is one minor exception to this rule, viz., 
when the valid positive rate equals the valid nega- 
tive rate, accuracy of prediction is independent of 
the base rates. 


problem applies in the prediction of mental disorder 
from kind and number of traumatic life experiences, 
just as it applies in the case of psychometric predic- 
tion (p. 407). 


Pearson and Kley go on to lament with others 
the fact that behavioral scientists do not tend 
to consider base rates in their study of case 
materials. 

The present study suggests that prediction 
in the area of delinquency is not an alto- 
gether lost cause at this time if one’s goals 
are reasonable or moderate and one tends to 
remain in the relatively narrow context in 
which the criterion data were gathered. In 
this connection Pearson and Kley (1957) 
suggest 

. that individuals in a population with a known 
and relatively high incidence rate for a particular 
disorder may be submitted to longitudinal investi- 
gation of a kind which would not be economical for 
samples drawn from the general population (p. 400). 


Although their argument was aimed at the dis- 
covery of etiological factors, one may also 
examine efficiency of a treatment program 
through the study of a highly concentrated 
sample of cases among whom pathology may 
be expected to occur. 

The approach used here was the multiple 
criteria technique discussed by Meehl and 
Rosen (1955). The first criterion was the 
Minnesota Multiphasic Personality Inventory 
(MMPI); cases were selected whose MMPI 
profile had certain scale elevations that were 
known to be related to delinquency (Hath- 
away & Monachesi, 1951). The second cri- 
terion was selection within this group using 
family history data developed in another 
study by the authors. It was shown in the 
earlier study that a great number of family 
history factors, especially those commonly 
recognized as tragic, can be related to de- 
linquency (Wirt & Briggs, 1959). Thus with 
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a well recognized personality technique and 
one of a number of possible indicators of 
family disorder, it was possible to obtain the 
results demonstrated below. 


METHOD 
Samples 


From a previously described sample of nearly two 
thousand boys tested by Hathaway and Monachesi 
(1951) during the school year 1947-48, a sample of 
573 cases stratified on MMPI code and delinquency 
was drawn from a population of 1,958 cases. These 
subjects had been tested in the ninth grade and 
most of them were thirteen years old at the time. 
Delinquency ratings were based on the period fol- 
lowing testing, so that in this study the prediction 
of delinquency refers to the prediction of a subse- 
quent phenomenon. 

The sample of boys was dichotomized, one group 
was composed of boys whose MMPI codes contained 
combinations of three delinquency “excitor” scales 
and this subsample of 201 cases represented 550 
cases in the population. The remaining group was 
composed of boys whose MMPI codes did not in- 
clude the excitor combinations. There were 372 cases 
representing 1,408 cases in the population.* The ex- 
citor MMPI scale combinations were: Pd-Sc, Pd-Ma, 
Sc-Pd, Sc-Ma, Ma-Pd, and Ma-Sc. (See Hathaway 
& Monachesi, 1951, and Wirt & Briggs, 1959, for 
justification. of these procedures.) Since the code 
numbers for these scales are 4—= Pd, 8=Sc, and 
9 = Ma, the excitor code sample is called the “489” 
group. 

Most studies of delinquents have used some con- 
tact with the police as a defining criterion, omitting 
the question of severity altogether. Delinquent se- 
verity is not a homogeneous personality dimension 
but probably reflects a sociological or judgmental 
dimension of the society or of the perceiver of the 
delinquent (Wirt & Briggs, 1959). The definition of 
delinquency adhered to in this study was a rating 
based upon court and police docket records. A di- 
chotomous split was made: (a) a less severe cri- 
terion including all cases who had any contact with 
the police and (b) a more severe criterion excluding 
persons whose contacts with the police involved only 
minor infractions. 


Family History Factors 
The 


was a 


source from which these data were derived 
survey of the records of 11 social agencies 
(voluntary and governmental). The case records of 
all boys and their families which were identified in 
the admission files of the 11 agencies were com- 
pletely reviewed for data which seemed psychologi- 
cally important. The data were collected in note 

3 The sample/population proportions are not equi- 
table because the samples actually include subsam- 
ples which were weighted differentially to produce a 
valid estimate of population values 


form and were found to include 42 fairly common 
but discrete items. These items were grouped into 
seven more general categories: family disruption, 
poverty or need, dissocial behavior, psychiatry for 
family, marital disruption, inadequate parent-child 
relationship, and minor psychological problems. The 
present study focused on data from one of these 
categories, “family disruptions due to disease,” which 
included six items: mother dies, father dies, mother 
chronically ill, father chronically ill, siblings die, or 
siblings are chronically ill. A category score could 
be determined by assigning one point for each item 
present in the family history. Thus a score of 1 
point meant that at least one of the items was true 
for a given family, 2 points meant that two items 
were true or that one item occurred twice, etc 

It is to be understood that these items of infor- 
mation were recorded in the social agency records 
before the subjects had been delinquent and before 
they were tested by Hathaway and Monachesi in 
1948. The delinquency that is referred to occurred 
after testing and thus also after any particular dis- 
ruptions of the family due to disease. 


RESULTS 


The data are presented for two degrees of 
delinquency. The less severe criterion, which 
nets of course the largest percentage of delin- 
quents, shows the estimated overall popula- 
tion rate of delinquency to be 41%. Among 
cases with elevated excitor (i.e., 489) codes, 
the proportion of delinquents was approxi- 
mately 43%. Using the slightly more severe 
criterion of delinquency in which minor of- 
fenses were excluded from the delinquent 
sample, the overall population rate was 32% 
delinquent, while the rate for the 489s was 
35% delinquent. 

Among cases with instances of family dis- 
ruption due to disease, the rate of delinquency 
for all codes tended to increase as the num- 
ber of family disruptions due to disease in- 
creased. That is, accuracy of delinquency 
prediction improved for cases known to have 
family disruptions due to disease regardless 
of MMPI patterns of the subjects. By calling 
the whole population delinquent the accuracy 
of prediction would be only 32%; by calling 
cases with a score of 1 or more points on 
family disruption due to disease delinquent, 
accuracy of prediction would rise to 43%, 
at a score of 2 or more points accuracy would 
reach 53%, and accuracy of such prediction 
increases to a maximum of 63%. Thus if one 
is interested in selecting potential delinquents 
for treatment, knowledge of social agency con- 
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TABLE 1 


PROPORTION OF Boys witH 489 MMPI Copes at EaAcnu oF SEVEN LEVELS OF 


FamILy DisRUPTION 


Due To DisEAsSE WHO BECAME DELINQUENT AT A BASE RATE oF 43% DELINQUENT 
PREDICTION RATIOS 


Delinquents Nondelinquents 
Number of — - 
Disruptions Est. cf pr 
0 1.00 
16 
05 
01 
O1 
01 
01 
00 


— eRe mW Uw 
anus 


- ou 


Note.—Apparent discrepancies in 
tact would be an asset. It should be noted, 
however, that this does not aid much in pre- 
diction of nonodelinquency. 

When MMPI code is taken into account, 
accuracy in prediction of delinquency is in- 
creased even more. The cases with 489 codes 
have a frequency of 28% in the population. 
The complete results for the 489 cases are 
presented in Tables 1 and 2 for the two dif- 
ferent levels of delinquent severity. This is a 
demonstration of the prediction model de- 
veloped by Meehl and Rosen (1955). Here 
the first column indicates the score (i.e., the 
number of disruptions in the family due to 
disease) from O through 7 or more. The sec- 
ond column gives the cumulative frequencies 
of these 489 cases who became delinquent at 


99 


1.00 


.84 


99 


99 
99 


PAaAaY 


sss 


table are due to rounding figures 


each social agency score from a maximum of 
7 or more points to a minimum of 0; these 
are estimates of the population cumulative 
frequencies. The column entitled p; trans- 
forms these cumulative frequencies to pro- 
portions based on the total number of 489 
cases that became delinquent. The third col- 
umn gives the cumulative frequencies for 
those 489 cases that did not become delin- 
quent at each social agency score. Correspond- 
ingly, the column /» transforms these cumu- 
lative frequencies to proportions based on 
the total number of 489 nondelinquents. 

For example, a ~; of .17 at a family dis- 
ruption score of 2 indicates that 17% of the 
489 cases who became delinquent scored 2 or 
more points on this particular social agency 


TABLE 2 


PROPORTION OF Boys witH 489 MMPI Copes 


AT EACH OF 
DveE To DisEAsE WHO BecAME DELINQUENT AT A BASE RATE OF 


SEVEN LEVELS OF FAmiILty DiIsRUPTION 


DELINQUENT 


2< Or 
5 
3 c 


PREDICTION RATIOS 


Nondelinquents 


Delinquents 


Number of 


Disruptions st. cf pr est. cf po 


00 
82 
95 
98 
98 
99 
99 
1.00 


.24 
Ri 
13 
.08 
08 
04 
02 


18 
05 
02 
01 
01 
01 
00 


Uk SNe © 


ma 


+ 


1.00 


Note.—Apparent discrepancies in table are due to rounding figures 
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scale. Similarly, a po of .05 indicates that 
only 5% of the nondelinquents scored 2 or 
more points on this scale. 

The proportions given in p; and pe do not 
take into account the base rates. 

Column g2 represents the valid negative 
rate before the base rates are taken into ac- 
count. This indicates the proportion of all 
489 nondelinquents that are appropriately 
labeled “nondelinquent” at each social agency 
scale score. The column entitled Pp, is P; 
multiplied by the base rate of delinquency 
for 489s and represents the valid positive rate. 
The column Qpz2 is p2 multiplied by the base 
rate of nondelinquency and represents the 
valid false positive rate. Qqe is gz multiplied 
by the base rate of nondelinquency and rep- 
resents the valid negative rate. Hy is Pp; + 
Qge and represents the overall accuracy of 
prediction, which is the accuracy with which 
one can call individuals falling above the given 
cutting scores nondelinquents and those at 
and below it delinquents. The column called 
Rp is Pp; + Ope and represents a sort of se- 
lection ratio telling the proportion of people 
for whom one is predicting at each level on 
the social agency rating (i.e., at a score of 1 
or greater, a score of 2 or greater, etc.). And 
finally, a column Hp is Pp,/Rp and it repre- 
sents the proportion of people at 
on the social agency rating who are delin- 
quent. This is the accuracy with which one 
can predict delinquency alone with no regard 
for false negatives. It is the most useful sta- 
tistic if one is trying to select a small sample 
for treatment. 

These tables show the advantage of pro- 
ceeding from a high base rate of delinquency. 
The selection ratio, Rp, indicating the per- 
centage of people for whom prediction is 
made at each level does not change appreci- 
ably with the change in rate (compare Rp in 
Table 1 with Rp in Table 2). Yet the ac- 
curacy of delinquency prediction, Hp, (and 
therefore the percentage of people at each 
level who are delinquent and who are cor- 
rectly identified) increases as the percentage 
of delinquents increase in the population. It 
should be noted that at the more severe cri- 
teria, with a 35% delinquency rate in the 
overall population, it is possible to identify 
a sample that is 79° delinquent although 


each level 


rABLE 3 
NuMBER OF Cases Wuicn Wowu.p Br 
QUENT FOR THREE Typt 
witn 1,000 Boys 


Founp DELIN 
OF SELECTION STARTING 
CHOSEN RANDOMLY 


Success at Each Rate 


N w// 
Severity Selected Del 
Random Severe 1000 
Mild 1000 
Severe 281 
Mild 281 
severe 16 


Mild 16 


All 489 


489 and 
3 Disruptions 


this group represents only approximately 6% 
of the 489 cases. At the less severe criteria of 
delinquency where the overall! rate is 43%, it 
is possible to identify a sample that is almost 
88% delinquent and again it includes only 
6% of the cases in the 489 population. 

The success with which delinquency can be 
predicted based on a hypothetical 1,000 cases 
using the information described above is 
shown in Table 3. Depending upon the needs 
of the individual situation, it is possible to 
predict that 316 boys in a population of 1,000 
would be fairly severely delinquent and as 
useful information is added, one can follow 
the 1,000 cases down to the precise but re- 
stricted sample obtained through the study of 
16 cases of whom 14 will be delinquent. 


DISCUSSION 


The principles involved in selection de- 
scribed here are empirical rather than theo- 
retical and do not carry general implications 
concerning the cases selected. It would be 
judged from other data not presented that a 
number of such techniques could be developed 
using different factors within the histories of 
delinquent boys. Some such techniques would 
probably be better than others and some 
would be more stable than others. Without 
theoretical knowledge of the reasons for the 
operation of each of the particular criteria 
established, it would be impossible to know 
when environmental or cultural changes would 
affect the validity of the techniques that are 
presented or developed. These tend to be the 
risks involved in using such an approach as 
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the present one. Of course, it is not impos- 
sible to build in safeguards, to define popula- 
tions, and to cross-validate from time to time, 
in order to raise the likelihood that one is 
not practicing an impoverished statistical 
ritual. 

It was indicated earlier that it is likely that 
delinquency is not a homogeneous psycho- 
gical variable. It is obvious that the selection 
of delinquents through some particular set of 
criteria will net a very special subpopulation 
of delinquents, a population which is not a 
random sample of all delinquents. Therefore, 
the research worker employing selection tech- 
niques based upon factors which heighten the 
likelihood of delinquency in a specific way 
will tend to meet with a specific type of case. 
This should be an advantage since there is 
some insurance of homogeneity within the 
subpopulation of delinquency besides the like- 
lihood of future delinquency. Furthermore, 
the procedures themselves, although not theo- 
retically involved in the understanding of 
delinquency, certainly suggest some factors 
which are important in the development of 
delinquency within the cases selected. There- 
fore, within the framework of criteria de- 
scribed in this paper, where delinquency 
seems to result from a combination of pres- 
ent personality status (characterized by poor 
judgment, excitability, and a certain unrealis- 
tic approach to the events of life) plus an ab- 
normal home in which tragedy and disease 
have left their permanent marks, suggestions 
for treatment are not as hard to make as if 
one were dealing with the random delinquent. 
Furthermore, one would suspect that cases se- 
lected in the same way might require similar 
treatment programs since such selection is a 
quasidiagnostic procedure. 


SUMMARY 


A technique for the discovery and identifi- 
cation of potentially delinquent boys was 
illustrated in this paper. A sample of 13-year- 
old boys, drawn randomly from a general 
urban population, was evaluated using the 
MMPI and a survey of their family histories 


obtained from social agencies. A multiple cri- 
teria approach to the identification of the pre- 
delinquent case was developed, starting with 
cases from the general population. To this the 
factors of MMPI codes and instances of se- 
vere disease or of death among members of 
the family were added. Using these two cri- 
teria, it was possible to develop small sub- 
populations which were about 80% saturated 
with predelinquent Such subsamples 
when compared with the general population 
were approximately twice as dense with pre- 
delinquent cases since the general population 
had a rate of about 40%. 

The possibility of using this technique in 
the establishment of treatment programs 
where small samples could be handled in 
areas where large numbers of delinquents are 
found was discussed. It was pointed out that 
such subpopulations would not be random 
samples of delinquents, but that such sub- 
populations would in fact be more homo- 
geneous subsamples of delinquents than are 
usually obtained. 


boys. 
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Statements regarding interpretation of pro- 
jective tests, such as the House-Tree-Person 
or the Figure Drawing Tests, were, in good 
part, originally based on intuition. To date, 
little experimental evidence has come forth to 
support or deny many of the assertions made 
by clinicians using such instruments. 

A frequently accepted hypothesis regarding 
the Draw-A-Person Test is that a subject (S) 
who draws first a person of the opposite sex, 
has a problem of sexual identification. While 
Machover (1949) has adduced empirical evi- 
dence in support of the hypothesis, Granick 
and Smith (1953), Barker, Mathis, and Pow- 
ers (1953), and several other studies cited in 
Swenson’s review article (1957) failed to con- 
firm the asserted relationship. 

The present study relates the order of fu- 
ture drawing to the discrepancy between self- 
concept and ideal self, mother, and father— 
as determined from the Leary Interpersonal 
Check List (1956). Four specific hypotheses 
were tested: 


1. Ss drawing the opposite sex first have 
different self-concepts (as revealed by the 


Leary Interpersonal Check List) 
drawing the same sex first. 

2. Ss drawing the opposite sex first have a 
greater similarity of self-concept with their 
perception of the parent of the opposite sex 
than Ss drawing the same sex first. 

3. Ss drawing the opposite sex first will 
show a greater discrepancy of the self-con- 
cept with their ideal self-concept than Ss 
drawing the same sex first. 


from Ss 


1 Our thanks are given to Martin S. Sloane, Super- 
intendent, for his permission to carry on this re- 
search. 


4. If Hypothesis 3 is supported, then those 
Ss who draw the opposite sex first will have 
an ideal self-concept resembling that of the 
parent of the same sex more than Ss who draw 
the same sex first. 


METHOD 


One hundred and fourteen 
students (57 females, 57 
Draw-A-Person Test and the Leary 
Check List (1956). The mean age for the females 
was 20.54 years and for the males almost 21.77 
years. This difference in age is significant at the .02 
level. Although our male and female samples differ 
significantly in age, it is felt that this is not a seri- 
ous drawback, particularly in view of Swenson and 
Newton’s (1955) study which indicated that age dif- 
ferences were not a significant factor in sex differ- 
entiation beyond the eighth grade. The mean years 
of college education for the females was 2.93 and 
for the males 2.58, while for the two combined it 
was 2.75. The difference in years of education is not 
significant (t = 1.46). 

On. the Leary each S was instructed to agree or 
disagree with 128 descriptive words or phrases which 
he would use in describing himself, his mother, his 
father, and his ideal self. When performing the Draw- 
A-Person Test, careful note was made of the sex of 
the first drawn figure. This was to be the basis upon 
which the groups would be divided. Thirteen of the 
57 females drew a male figure first, while 16 males 
drew the female figure first 
significant (chi square = 40) 


college 
given the 
Interpersonal 


undergraduate 
males) were 


The difference is not 


The template scoring system was used for each 
Leary protocol. Then each Dominance and Love 
vector was computed. The means and sigmas of 
the vectors were computed for the various groups 
Table 1 gives these data. These groups include fe- 
males drawing their own sex first (Ff), those draw- 
ing the opposite sex first (Fm), males drawing a 
male figure first (Mm), and the male group drawing 
the female first (Mf). Several t tests were run for 
Groups Ff and Fm, Mm and Mf, total female and 
total male. As a final analysis of the data Table 2 
indicates the ¢ values between the means of different 
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TABLE 1 


AND SIGMAS 


Ff 


w 


51.64 
53.84 
50.14 
48.85 
50.19 
49.23 


61.25 
61.46 
61.29 
61.22 
59.37 
60.70 


Fm 
F total 
Mm 
Mi 
M total 


OAUaanuwwn 
Ss eS 


Total M & F 50.68 60.99 


ratings on Dominance 


groups. 


and Love vectors for 


RESULTS 


Few differences between group means were 
significant. Out of 24 ¢’s, only two reached 
the 5% level: men vs. women, on self-con- 
cepts scored for Love, and women drawing 
same sex first vs. women drawing opposite 
sex first, on mother concept scored for Love. 
These last findings could well be chance fluc- 
tuations. 

The groups as a whole (considered on the 
basis of their sex differences only) showed a 
difference between the means of the Love 
vectors (p= .05) on the self-rating scale. 
Examination of Table 1 shows us that the fe- 
males rate themselves higher on those dimen- 
sions tapped by the Love vector, while males 
scored higher on the Dominance vector. The 
implications of this will be dealt with in the 
discussion section. 

In Table 2 we can examine the significance 
tests between the means of the various rat- 
ings for the Dominance and Love vectors for 


OF Vectors FOR Four RATINGS 


OF GR 


ALI 


49.09 
47.92 
48.42 
49.98 
47.06 
49.16 


8.36 


48.99 


the Ff, Fm, Mm, and Mf groups. The Ff row 
shows us that the only difference which is not 
significant for the Dominance vector is that 
between Father and Ideal Self. Two differ- 
ences fail to reach significance, however, for 
the Love vector, Self-Father, and Mother- 
Ideal Self. The Fm group shows a different 
pattern from the Ff in that the Self-Mother 
ratings on the Dominance vector are not sig- 
nificantly different. On the other ratings in 
this vector these two groups are essentially 
similar. For the Love vector the Fm group is 
quite similar on most of the ratings with the 
exception of the Self-Ideal Self ratings. The 
Fm sample does not differ on these two rat- 
ings, while for the Ff sample the difference 
between Self and Ideal Self is highly signifi- 
cant (p= .001). 

The males who drew the male figures first 
(Mm) have significant differences between all 
the ratings on the Dominance vector except 
Father-Ideal Self, common to Ff 
and Fm groups as well. Two comparisons of 
ratings have small differences on the Love 


a findin 


Co 
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TABLE 2 


t’s BASED ON MEAN DIFFERENCES BETWEEN INDICATED PAIRS OF 
RATINGS FOR VARIOUS GROUPS 


Dominance Vector 


Self 
mother 


Self - 
father 


Self 
Ideal Self 


—9,13** 
9.31** 
—10.75** 
-7.41* 
—6.00"* 
_ 7.02" 


~—9.00** 


Mother 


Group Ideal Self 


~6.95** ~9.81** 
-6.54* 
~—9.07** 
—4.75** 
5.57* 
4.98** 


—7.21* 


~4.18** 
—5.69* 
—4.53** 
_3.73** 
—4.25 
~3,88** 
—4.34** 


Mf 
M total 
M & F total 


Father 
Ideal Self 


—1.88* 


Mother- Father 
lf Ideal Self Ideal Selif 
1 


? 


i 


1.45 —7.34* 
+453 —6.93* 
0.09 7.25** 
+0.83 4.87** 
0.31 8.00* 
+0.51 5.75** 


+-0.06 —6.50** 


3 
7 
6 
6 
4 


, 
) 
2.04 
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* Significant at .05 level 
** Significant at .01 level 





Sexual Identification 


vector for the Mm group. The Self-Father 
ratings show no significant differences, as do 
the Mother-Ideal Self ratings. The overall 
pattern of significant differences is largely 
similar for both groups drawing their own 
sex first in both the Dominance and Love 
vectors. 

Those males drawing the female first (Mf) 
had no significant difference between ratings 
of Self and Mother on the Dominance vector, 
much as the Fm group also showed. And once 
again, as each group thus far has shown, there 
is no significant difference between Father 
and Ideal Self. 

Four ratings show no significant differences 
in the Love vector for the Mf group. The 
Self-Mother, Self-Father, Self-Ideal Self, and 
Mother-Ideal Self ratings are without the sig- 
nificant differences which characterize the Mm 
males. The most outstanding differences be- 
tween the two male groups on the Love vector 
are on the ratings between Self and Mother 
and between Self and Ideal Self, with the Mm 
males showing the greatest differences. 

Again the overall pattern of significant dif- 
ferences is practically identical for both 
groups drawing the opposite sex first, a find- 
ing which applied to both vectors. It would 
appear that the sex of the S doing the draw- 
ing does not differentiate the interrating com- 
parisons as much as the sex order in which 
the figures are drawn. 

An inspection of the column of the com- 
parisons between parent ratings and the Ideal 
Self ratings reveals a consistent trend. In the 
Dominance vector it is noted that all groups 
ideally tend to identify (i.e., show no signifi- 
cant differences between Father and Ideal Self 
ratings) with the perceived masculine quali- 
ties of the father but not with the mother ex- 
cept in one instance where the Mf group does 
show some degree of identification between 
Father and Ideal Self. A similar pattern of 
significance tests was obtained on the Love 
vector except that in this case the groups 
unanimously wanted to identify with the 
Love qualities perceived in their mothers and 
not in their fathers. 


DISCUSSION 


Our study tends to support the statements 


made by Machover (1949). On the other 


and Figure Drawing 53 
hand, our results are not in line with those 
studies suggesting that certain conclusions 
cannot be drawn from a sample which draws 
its own sex first as against a sample which 
draws the opposite sex first. We find marked 
differences between these two groups. These 
differences occur in their perception of their 
self and ideal self as compared to their rat- 
ings of their parents. These are not gross dif- 
ferences but are, rather, of a selective nature. 
Some appear only in relation to qualities of 
dominance, others to love. 

We are in a position now ‘to examine our 
original hypotheses. Let us repeat each and 
see if it is upheld or denied. 

Hypothesis 1. Ss drawing the opposite sex 
first have different self-concepts (as revealed 
by the Leary Interpersonal Check List) from 
Ss drawing the same sex first. 

This was not supported. The self-concept is 
measured by the Dominance and Love vectors, 
and the means on these vectors were not dis- 
similar enough to support this hypothesis, al- 
though when the sexes were combined, the ¢ 
test done between the mean female Love score 
and the mean male Love score differed at the 
05 level. This could be indicative of a sex 
difference, although it may be attributable to 
the larger V. 

Hypothesis 2. Ss drawing the opposite sex 
first have greater similarity of self-concept 
with their conception of the parent of the op- 
posite sex than Ss drawing the same sex first. 

The Fm group rates the self more like the 
father on dominance qualities than does the 
Mf group Ss who also rate their self more 
like their mothers on dominance qualities than 
does the Mm group. The hypothesis is, there- 
fore, upheld at least for Dominance vectors. 

On the Love vector the hypothesis is only 
partially upheld. Ff and Fm groups are not 
different in how they perceive their self from 
their fathers. But there is a significant differ- 
ence between how the men rate themselves 
and their mothers. The Mf group sees its self 
having love qualities like their mothers’ love 
qualities, but the Mm group is significantly 
different in this respect. 

Hypothesis 3. Ss drawing the opposite sex 
first will show a greater discrepancy of the 
self-concept with their ideal self-concept than 
Ss drawing the same sex first. 
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This is not confirmed. Although on the 
Dominance vector males and females both 
show a significant difference between their self 
and ideal self-concepts, this difference is by 
far more pronounced for Ff and Mm groups. 
On the Love vector the disagreement between 
self and ideal is in fact seen only for Ff and 
Mm groups. This could suggest that Ff and 
Mm groups have higher aspirations and are 
more ambitious for themselves, thereby not 
being easily satisfied. Or it could mean that 
drawing the opposite sex first reflects a 
healthier acceptance of one’s goals by either 
being more capable of attaining the ideal, or 
bringing the ideal down more in keeping with 
the way one finds himself. This would cer- 
tainly be a worthwhile subject for further re- 
search, especially in view of the work of 
Butler and Haigh (1954) in which thera- 
peutic progress was found to be a decrease in 
discrepancy between the self and ideal self- 
concepts. 

Hypothesis 4. This hypothesis was contin- 
gent upon the confirmation of Hypothesis 3 
and is hence rejected also. 

The hypotheses originally proposed have 
failed to deal with all of the resulting data. 
A few additional comments are in order. 

The most outstanding finding from this 
study reveals that Ss of either sex, if they 
draw their own sex first, will tend to be 
basically similar regarding perceptions of self- 
ideal self, and parents. This also tends to be 
true for all Ss drawing the opposite sex first, 
in that they too agree with each other on 
these ratings. 

Although most of the mean difference com- 
parisons yielded significant ¢’s, none resulted 
for the Father-Ideal Self comparisons on the 
Dominance vector for all four groups, sug- 
gesting that all Ss ideally want to be like the 
dominant qualities they perceive in their fa- 
thers. On the Love vector the Self-Father and 
the Mother-Ideal Self comparisons failed to 
reach significance in any of the groups, im- 
plying that all Ss attribute similar love quali- 
ties to themselves and their fathers, although 
ideally they would like to have the love quali- 
ties perceived in their mothers. 

We conclude that Machover’s contention 


has at least some inferential support from 
this study. 


SUMMARY 


One hundred and fourteen college under- 
graduate subjects, 57 males and 57 females, 
with an average college education of 2.5 years 
were given the Draw-A-Person Test and the 
Leary Interpersonal Check List. The sex of 
the first drawn figure was noted and the rat- 
ings of the subjects for themselves, mothers, 
fathers, and ideal selves were scored and com- 
pared according to Leary’s Dominance and 
Love vectors. 

An analysis of the data suggests that dif- 
ferences exist between the groups drawing 
their own sex first from groups drawing the 
opposite sex first. Four hypotheses were tested 
and research suggestions made from these 
data. Of these four, three were rejected and 
only one was partially upheld. However, the 
pattern of differences between self-concept 
and concepts of ideal, of father, and of mother 
in the various groups was such as to provide 
some inferential support for Machover’s po- 
sition. 
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REPLICATED FACTORS ON THE MMPI WITH 
FEMALE NP PATIENTS 


WILLIAM J. EICHMAN 


eleran 


Although factor analytic techniques have 
been applied to Minnesota Multiphasic Per- 
sonality Inventory (MMPI) scales on a num- 
ber of occasions, there is no study in the lit- 
erature which employs female NP patients as 
a subject group. Consequently, it is unknown 
whether the factorial structure of the MMPI 
with female patients is different from that 
with male patients. Studies with male sub- 
jects indicate essential agreement as to the 
loadings on the first two factors although the 
interpretations of the factors differ from one 
study to the next. Welsh (1956) seems to 
have conceptualized the two dimensions most 
adequately as anxiety and repression. He de- 
veloped item scales for the two factors and 
labeled them A and R. More than two fac- 
tors have been found in all reported studies 
but the loadings have differed from one study 
to the next. Welsh (1956) identified two fur- 
ther factors in his study and developed item 
scales for them. These scales were difficult 
to interpret and have received little fur- 
ther attention. Wheeler, Little, and Lehner 
(1951) found “paranoid adjustment” and 
“psychopathic adjustment” factors. Kasse- 
baum, Couch, and Slater (1959) found a 
third factor which they labeled “tender 
minded sensitivity.” 

With the single exception of Welsh, no one 
has attempted to make practical use of the 
factor studies. Authors have been almost uni- 
versally critical of the MMPI as a clinical 
instrument because it seemed to measure 
only two variables and took 12 or more scales 
to do the job. To a very large extent this 
criticism is a justifiable one although it should 
be noted that a number of the scales have 
only moderate communalities in the tables of 
factor loadings. An example of this is found 
in the recent study of Kassebaum et al. 


5 


idministration Hospital, Roanoke, Virginia 


(1959) in which 6 of the 12 scales have com- 
munalities below .50 after the extraction of 
three factors. These are the L, F, Hs, Hy, Mf, 
and Pa scales. Thus many of the scales have 
extremely large error variances or measure 
something which is unique to the particular 
scale. The positive results of a large number 
of predictive studies using these scales would 
support the latter interpretation. The end re- 
sult of the factor studies on the one hand and 
the predictive studies on the other leaves the 
practising clinician in a state of confusion. 

The present study attempts to identify fur- 
ther factors in samples of female NP patients 
at a VA hospital. One of the flaws in previous 
studies is that few have been replicated using 
the same matrix of scales. The authors have 
hesitated to accept more than the first two 
factors as significant. Some have not used the 
validity scales (L, F, K), and others have 
employed a variety of additional scales be- 
yond the 9 original scales with little congru- 
ence of scales from one study to another. 
This study utilizes 17 scales in two samples 
of NP females. 


METHOD 


Subjects. Two samples (Ns = 62, 85) of female 
patients were used. Aside from the fact that the first 
group of records was collected earlier than the sec- 
ond, the two groups did not appear to differ. The 
mean age of the total group was 35.4 with an SD 
of 8.3. Length of hospitalization ranged from a few 
days to 10 years. The diagnostic groups represented 
are presented in Table 1. Mean scores on the MMPI 
scales for the combined sample are presented in 
Table 2. Tests of significance between the groups on 
each of the 17 scales showed a difference only on the 
Mf scale and this was small in absolute magnitude. 

Scales. Seventeen MMPI scales were used in each 
sample. These were the 3 validating scales, the 9 
clinical scales, the Taylor A scale (Ar) (Taylor, 
1953), the A and R scales (Aw and Rw) (Welsh, 
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TABLE 1 method. All rotations were done by simple graphical 
procedures 





Primary DIAGNOSES OF Two SAMPLES 
+ Tp ~ - ~ 
oF NP FEMALES RESULTS 


wee ; Correlation matrices for the separate sam- 

Sample A Sample B : 

(N == 62 vy = gs)  Ples are not presented since the only purpose 
in using two analyses was to assess the sta- 


Schizophrenia 32 45 bility of factor loadings 
Manic- Depressive u 6 Four factors were extracted from the cor- 
Psychotic Depression 1 1 


relation matrix of each sample. Further fac- 
Chasecker anil Poreonalits tors were not extracted since the average 
Disorders 3 7 residual value in the fourth residual matrix 
Neuroses 17 21 was .02. The largest residual value found at 
Organics: 4 this time was .08. The centroid loadings are 
ene bas presented in Table 3. Orthogonal rotations 
62 85 were then carried out by simple graphical pro- 
cedures. The rotated loadings are presented 

in Table 4. Plotting the test loadings on the 

and the Ego Strength scale (Es) (Barron, 1953) factor vacoors quickly indicated that a very 
Statistical technique. Raw scores were used for close approximation to simple structure could 
obtaining product-moment correlations. The factor be obtained by maximizing the loadings on 


~ 


Schizophrenic, Post-lobotomy 
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1956), the Dependency scale (Dp) (Navran, 1954), 


analysis was carried out by Thurstone’s centroid Aw and Rw for the first and second factors. 

TABLE 2 

Mean MMPI T Scores For Torart SAMPLtI 
(N = 147 
L F K Hs D Hy Pd Ui Pa Pi Se Va 
54.3 59.7 55.4 57.2 64.4 63.3 66.0 49.8 63.4 55.6 58.6 58.6 
TABLE 3 
CENTROIW Factor LOADINGS FOR SAMPLES A AND B 
Factor I Factor II Factor III Factor IV h2 

A B A B A B \ B —— 
L —48 —46 +50 +27 -17 —32 12 +19 52 42 
F +66 +63 +45 —40 —40 —42 +13 +19 82 77 
K —70 —66 +41 +53 +14 —14 18 +25 71 80 
Hs +80 +67 +34 +42 —13 —35 34 30 89 84 
D +69 +74 +45 +40 +36 +23 +21. +30 85 85 
Hy +60 +33 +52 +73 +05 —23 39 03 78 70 
Pd +65 +72 +14 —11 -17 —07 +20 +32 51 64 
M/ +26 +17 -17 +24 +16 +15 32 13 22 13 
Pa +70 +i)6 +16 —12 —17 25 O&8 +06 55 66 
Pt +91 +94 +09 —02 +27 +24 +20 +02 95 94 
Sc +89 +91 +18 —26 -17 —18 +25 +10 92 94 
Ma +42 +36 —31 —63 -42 —28 21 21 49 65 
At +95 +93 +02 +10 +22 +26 03 07 95 95 
Aw +92 +92 —14 —20 +26 +23 +17 +03 96 94 
Rw —06 +14 +66 +62 +37 +10 til +34 59 «53 
Dp +88 +91 —22 —10 +32 +26 +18 04 96 91 


Es —86 —80 —-10 —14 —12 +18 + il +15 77 «71 
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TABLE 4 


Rotatep Factor LOADINGS FOR SAMPLES A 


Factor I Factor II 

\ B \ B 
L 99 —S§8 1.34 20 
F +47 +58 20 23 
K 72 —76 +40 +47 
As +61 +45 +16 +27 
D +71 +-68 57 +62 
Hy +43 +11 38 +63 
Pd +58 +69 05 +13 
Vi +25 +14 13 +19 
Pa +57 +68 03 01 
Pt +94 +95 +22 +16 
Se +81 +88 +09 08 
Va +29 +39 52 64 
Ar +94 +92 +10 +22 
Aw +98 +97 +02 00 
Rw ~05 00 +77 +73 
Dp +97 +94 04 +04 
Es 81 —68 11 —13 


respectively. This procedure has the addi- 
tional advantage of objectifying the process 
of rotation for the two samples and making 
the comparison of the two analyses more 
meaningful. Thus only the locations of the 
reference axes for Factors III and IV had to 
be determined by subjective means. This last 
rotation for each sample was done with 
Thurstone’s principle of simple structure in 
mind. Discussion and comparison of the four 
factors are presented below. 


Comparison of Factor Loadings 


Factor I. The first factor obtained in both 
samples is obviously the same as that found 
in previous factor analyses of the MMPI. 
Loadings above .9 are found on Aw, Ar, Dp, 
and Pt in both samples. Correlating the pairs 
of loadings results in a Pearson coefficient of 
.98 and a rho of .96. 

Factor II. Significant loadings (above .3) 
are found for six scales of Factor II in Sam- 
ple A and for five of these same six in Sample 
B. In order of magnitude, with the respective 
loadings in parentheses, the scales are Rw 
(.77, :73), D (.57, 62), Ma (—.52, —.64), 
K (.40, .47), Hy (.38, .63), and LZ (.34, .20). 
The Pearson r for the 17 pairs of loadings is 
87 and rho is .74. Thus considerable corre- 
spondence is found for this factor in. the two 


57 
AND B 

Factor III Factor IV h? 
\ B A B A B 
119 +07 117 194 53 44 
34 +16 +66 +59 81 76 
+05 —03 17 +07 71 80 
+66 +74 +-21 00 88 82 
+06 +02 +11 +01 84 85 
+65 +53 +-07 01 76 69 
4-12 01 +40 +35 51 62 
+25 +12 —29 +25 23 13 
+38 +26 +78 +33 55 64 
03 +08 +09 05 94 94 
+15 +18 +48 +35 92 94 
32 +14 +21 4-26 50 65 
+25 +16 01 14 96 94 
00 00 00 00 96 94 
—(1 00 02 00 60 53 
06 09 —0O7 +05 95 90 
32. —47 05 —09 77:«71 


samples. The scales which have the highest 
loadings also indicate that the factor is highly 
similar to the second or R factor found in 
previous experiments. 

Factor III. The third factor showed six sig- 
nificant loadings in Sample A but only three 
in Sample B. Those scales which had signifi- 
cant loadings in both analyses were Hs (.66, 
74), Hy (.65, 53), and Es (—.32, —.47). 
The three scales which did not appear to be 
significant in the second sample were Pa (.38, 
.26), F (.34, .16), and Ma (.32, .14). That 
this difference between the two analyses is 
more apparent than real is indicated by a 
Pearson r of .95 between the pairs of loadings 
and a rho of .89. As a consequence, it was ac- 
cepted that the factor was satisfactorily repli- 
cated. 

Factor IV. Three scales attain significance 
on the fourth factor in both samples. These 
are F (.66, .59), Sc (.48, .35), and Pd (.40, 
35). An additional scale, Pa (.28, .33), passes 
the significance point in Sample B. Although 
this factor can be accepted as similar in both 
samples, the correspondence is not nearly so 
great as in the other factors. The Pearson r 
is .70 and the rho is .58 between the sets of 
loadings. Fairly large differences in absolute 
loadings occurred on two scales, Mf (—.29, 
.24) and K (—.17, .07); and large relative 
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differences occurred on others, e.g., Hs from 
fifth highest to twelfth highest, Pt from ninth 
to fifth highest, and Dp from fifteenth to 
ninth highest. 


Combined Results 


Since close identity was found for the first 
three factors in both samples and similarity 
was found for the fourth, the two samples 
were combined and the results were reana- 
lyzed as described previously. The correlation 
matrix for the combined sample, the centroid 
loadings for the data, and the rotated load- 
ings are presented elsewhere.* 


Interpretation of Factors 


Factor I. This factor accounts for 63.1% 
of the common factor variance in the table 
of intercorrelation. The high loadings on Ay, 
Ay, Dp, and Pt indicate that it is identical 
with factors found in previous studies. It ap- 
pears to be a general maladjustment, anxiety, 
and/or complaint factor. 

Factor II. The second factor accounts for 
16.0% of the common factor variance. Since 
it has its greatest loading on Rw, we may as- 
sume that it is highly similar, if not identical, 
to Welsh’s repressive-expressive factor. The 
other scales with high loadings are D (.62), 
Hy (.56), Ma (—.55), and K (.41). These 
loadings are consistent with the interpreta- 
tion of a bipolar factor with repression at one 
extreme and expression at the other. 

Factor III, This factor accounts for only 
9.6% of the common factor variance in the 
study but showed stability from one sample 
to the second. In the combined sample analy- 
sis, only three scales had substantial loadings: 
Hs (.69), Hy (.60), and Es (—.38). The com- 
mon variable in the Hs and Hy scales is an 
expression of physical symptomatology. Bar- 
ron (1953) described 11 of his 68 items on 
the Es scale as relating to physical function. 
Thus we can tentatively label this factor as a 


1 Tables of the correlation matrix, centroid load- 
ing, and rotated loadings for the combined sam- 
ple have been deposited with the American Docu- 
mentation Institute. Order Document No. 6465 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress; Washington 25, 
D. C., remitting in advance $1.25 for 35 mm. micro- 
film or $1.25 for 6 X 8 in. photocopies. Make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 
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somatization variable. There may be other 
correlates of the factor such as are repre- 
sented by the subtle items of the Hy scale 
(Wiener, 1948), those whose content deals 
with character formation rather than with 
physical symptomatology, but this cannot be 
established by the present study. 

Factor IV. Factor IV accounts for 11.3% 
of the common factor variance in the study 
but is apparently stable from one sample to 


another. High loadings are found on five 
scales: F (.70), Sc (.48), Pa (.44), Ma 
(.37), and Pd (.34). These scales when 


found to be high on a clinical profile are usu- 
ally interpreted as representing a psychotic 
level of functioning and/or a potential for 
the acting-out of impulses. The writer pre- 
fers to label the factor as an acting-out tend- 
ency at the present time since clinical experi- 
ence indicates that many patients with char- 
acter disorders also have high scores on these 
scales. 


Comparison with Other Studies 


Two studies were selected for comparison 
with the present study on the basis that simi- 
lar scales were included in the matrix. Both 
studies (Kassebaum et al., 1959; Fisher, 
1957) used male subjects, college students in 
the first instance and VA medical and psy- 
chiatric samples in the second. This compari- 
son should partially answer the question as to 
similarity of factor structure of the MMPI 
between male and female subjects. The Aw 
and Rw factors are not discussed because they 
show such striking similarity in all studies. 

Somatization factor. Kassebaum labeled his 
third factor “tender minded sensitivity,” par- 
tially because of loadings on scales not in- 
cluded in the present study. Of the 16 scales 
used in both studies, the two highest load- 
ings in both cases are on Hs and Hy, indicat- 
ing that some similarity exists between the 
factors. A Pearson r of .69 was obtained for 
the two sets of loadings. Fisher separately 
analyzed a medical and a psychiatric sample, 
both samples being drawn from a general 
rather than an NP hospital. It is not sur- 
prising that his second factor resembled our 
somatization factor and was so labeled by 
him. The Pearson r between his medical sam- 
ple and the present study was .73 and be- 
tween his psychiatric sample and the present 
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study it was .83. Thus some correspondence 
among factors on all three studies was found. 
This correspondence was at a maximum when 
the subjects were psychiatric patients and at 
a minimum when the subjects were college 
students. Closer correspondence could prob- 
ably be obtained by rotating the factors of 
other studies so that Aw and Rw would be 
at a maximum as was done in this study. 

Acting-out factor. The fourth factor found 
in this study resembles a fifth factor found 
by Fisher and described as a social alienation 
factor with psychotic implications. The corre- 
spondence between the acting-out factor de- 
scribed above and Fisher’s social alienation 
factor in the psychiatric sample is particu- 
larly close, in both cases the same five scales 
have loadings above .3. These are F, Pd, Sc, 
Pa, and Ma. The Pearson r for the 16 pairs 
of loadings is .70. As indicated previously, 
rotations maximizing Aw and Rw would likely 
result in an increased correlation. 


DISCUSSION 


The most important finding of this study 
appears to be the stability of factor loadings 
on replication with a similar sample of sub- 
jects. The somatization and acting-out fac- 
tors found here each account for approxi- 
mately 10% of the common factor variance, 
an amount which several investigators have 
deemed not worthy of interpretation. Never- 
theless, the factors, as presently interpreted, 
seem to represent frequently observed behav- 
iors in patients and appear to be worthy of 
objective measurement. The finding that the 
factors account for so little of the common 
factor variance is the result of the original 
construction of the test. For example, if 
scales were constructed separately to measure 
the different psychophysiological reactions, we 
might expect to find a common factor of 
somatization which would account for a very 
large amount of the common variance of the 
table. If, in the same table of intercorrela- 
tions, only two measures of general malad- 
justment were included, we would find that 
this factor accounted for very little of the 
common variance. 

We may conclude that the MMPI, as pres- 
ently scored with the clinical and validity 
scales, is overloaded with measures of general 
maladjustment and that other more pure 








scales can be profitably constructed. The 
second factor found in this study can be 
measured fairly well by the Rw scale. The 
third factor (somatization) can be fairly well 
measured by the altitudes of the Hs and Hy 
scales on the MMPI profile. The loadings of 
these scales on Factor III are greater than 
their corresponding loadings on the general 
maladjustment factor. The five scales which 
have respectable loadings on Factor IV (act- 
ing-out) have high loadings on the genera! 
maladjustment factor as well. Consequently 
it seems that this factor cannot be so easily 
distinguished from general maladjustment 
when the clinician is faced with an individual 
profile. Thus if we hope to measure acting- 
out potential with the MMPI, it seems very 
desirable to construct a scale which is uncon- 
taminated with the general maladjustment 
factor. 

The factorial composition of the Ay scale, 
the Dp scale, and the Es scale shows that 
these measures overlap to a very considerable 
extent with the Aw and Pt scales. A substan- 
tial body o1 literature has grown up around 
several of these scales as if something unique 
were being measured. It would seem wise for 
the person who develops a new scale to do 
correlational studies with already existing and 
validated scales. 

We can expect similar developments in the 
future, i.e., many new scales will be created 
and*many of these will be near duplicates of 
those already in existence. An example of this 
is to be found in the recent paper by Kasse- 
baum et al. (1959) in which 19 nonclinical 
scales were included in a factor analysis with 
the original MMPI scales. Excluding Aw and 
Rw, the average factor loading of the remain- 
ing 17 scales on the general maladjustment 
factor is .66. The same average on the second 
or repression factor is .31. Squaring each of 
these average factor loadings and summing to 
arrive at a communality we arrive at a figure 
of .53. Thus approximately 50% of the total 
variance of these new scales is wasted on fac- 
tors which are measured much better by other 
scales. The supposed “nonclinical” scale is 
often as much affected by this contamination 
of general maladjustment and repression as 
is the clinical scale. In fact the nonclinical 
scale designed to arrive at the strengths of 
individual subjects is often a clinical scale 
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turned upside down. Examples drawn from 
the above mentioned paper include the Lead- 
ership scale with a loading of —.85 and the 
Tolerance scale with a loading of —.80 on 
the first factor. In the analysis presented 
here, the only extreme example is the Dp 
scale although the Es scale has a high load- 
ing on Factor I with a dash of Factor III 
(somatization) thrown in. 

The above discussion can be summed up as 
an argument against the endless accumulation 
of scales on personality inventories, scales 
which purport to measure one thing but 
which actually measure something else much 
better. The wiser course of action would seem 
to be checking out each item of a new scale 
against certain basic scales, particularly the 
general maladjustment factor in any of its 
several forms. Each new item should at least 
correlate more highly with the criterion 
against which the scale is being developed 
than it does with the general maladjustment 
variable. Without this precautionary measure, 
there will be a piling up of variance associ- 
ated with general maladjustment to the ex- 
tent that the end result is a good measure of 
the wrong thing. 

An argument can be made that the general 
maladjustment factor itself is not a pure scale 
and this appears to be valid. Comrey (1958) 
has found at least seven significant factors in 
an item analysis of the Pt scale which, as 
previously indicated, is a good measure of the 
first factor. An argument can then be made 
that these subcatezories of general maladjust- 
ment need to be measured independently and 
that the general maladjustment factor is 
merely a global construct which would be 
more useful if broken down into its com- 
ponent parts. This matter can only be re- 
solved by empirical investigation. A practical 
hindrance is the fact that only a few items 
are represented in each of Comrey’s subscales 
and more would have to be found in order 
to make a practical evaluation of predictive 
power. 


SUMMARY 


Seventeen scales from the MMPIs of fe- 
male NP subjects were factor analyzed and 
replicated. Four factors emerged clearly in 
both samples. These were labeled tentatively 
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as anxiety, repression, somatization, and act- 
ing-out. Comparisons were made with two fac- 
torial studies of male subjects and consider- 
able correspondence was found for all four 
factors. Similarity of factor structure was 
greatest when NP male patients were com- 
pared with NP female patients and least when 
male college students were compared with fe- 
male NP patients. 

The results of the study seem to indicate a 
clear need for the construction of pure scales 
to measure the third (somatization) and fourth 
(acting-out) factors. Existing scales which re- 
late to these dimensions of behavior are highly 
correlated with first and second factor scores 
and cannot easily be interpreted. 

A further implication of the results is that 
careless construction of new empirical scales 
has resulted in near duplicates of the anxiety 
scales which makes them relatively useless. 
Additional disadvantages are that these new 
scales are variously named according to the 
particular criterion employed and that inde- 
pendent bodies of research tend to build up 
around them. 
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Yates (1954, p. 374) has voiced the criti- 
cism that many investigators seem to view 
brain damage as a “unitary factor.” Klebanoff, 
Singer, and Wilensky (1954) have suggested 
that a major reason for lack of agreement in 
the results of studies of psychological impair- 
ment related to organic brain damage or dis- 
ease in large part may reflect differences in 
type, locus, or severity of the brain lesions 
represented in the samples studied. Birch and 
Diller (1959) point out that “a clear view of 
the evidence is made difficult, or even impos- 
sible, by the fact that the various parameters 
of cerebral dysfunction have not been ex- 
amined systematically” (p. 188). Macroscopic 
and microscopic studies as reported in neu- 
rology textbooks such as Wechsler (1958) 
indicate that the type or severity of brain 
lesions may cause marked differences in the 
organic condition of the brain. The detri- 
mental effects upon adaptive abilities due to 
acutely destructive lesions such as intrinsic 
tumors or cerebral vascular accidents may be 
more dramatic than the effects of relatively 
static conditions such as healed head wounds 
or slowly progressive conditions. The present 
study was designed to investigate psychologi- 
cal deficits in relation to acuteness of organic 
brain lesions. 


METHOD 
Subjects 


Four groups, each consisting of 16 hospitalized 
patients, were studied. Corresponding subjects were 

1 This investigation was supported in part by Re- 
search Grant B-1468 from the National Institute of 
Neurological Diseases and Blindness, United States 
Public Health Service. 

The writers are indebted to Maryellen Means for 
assistance with the statistical computations 
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individually matched as closely as possible according 
to chronological age, sex, race, and years of educa- 
tion. Three groups were composed of patients diag- 
nosed as having organic brain damage or disease 
Diagnoses were based upon detailed medical his- 
tory, electroencephalography, neurological examina- 
tion, and, when further clarification was needed, 
angiography, pneumography, and repeated neuro- 
logical examinations. The fourth, or Control group, 


was composed of patients in whom organic brain 
damage was confidently ruled out on the basis of 
similar, although generally less extensive, clinical 


diagnostic procedures. 

One brain damaged group (Acute) was composed 
of patients who had acute neurological illnesses and 
whose neurological signs and symptoms were pres- 
ent at the time of psychological testing. These pa- 
tients had experienced a specific, temporally defined, 
episode during which their current neurological find- 
ings had arisen, or had developed a rapidly progres- 
sive brain disease with steady progression of neuro- 
logical signs. A second brain damaged group (Rela- 
tively Static) was composed of patients who had 
either recovered from acute neurological signs if 
there had been an acute onset of symptoms, or who 
had slowly progressive brain disease without evi- 
dence of acute or sudden onset. Among this group, 
the patients with sudden onset of brain dysfunction 
(e.g., penetrating head injury) had with the passage 
of time recovered from acute neurological deficits, 
suggesting reorganization of brain functions and a 
relatively static condition of the brain. The third 
brain damaged group (Chronic-Static) was com- 
posed of patients with chronic, long-standing brain 
dysfunction who were institutionalized in a state 
hospital for patients with neurological disorders. The 
diagnoses of all patients in this group included some 
form of epilepsy. None of the other groups included 
institutionaiized patients. Diagnoses of the patients 
in the four groups are presented in Table 1. 

Differences between the mean ages and mean num- 
ber of years of education among the groups did not 
approach statistical significance. The mean ages, in 
years, were: Acute, 32.62 (SD 10.13); Relatively 
Static, 33.88 (SD 10.39); Chronic-Static, 32.88 (SD 
10.84) ; and Controls, 32.38 (SD 10.82). Mean years 
of education for the groups, in the same order, were: 
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TABLE 1 
DrtAGNOsTIC DISTRIBUTIONS WITHIN BRAIN DAMAGED AND CONT! Gi 
Acute Relatively Stati 
N = 16 N = 16 
Acute subdural hematoma 1 Cerebral arteriosclerosis 1 
Astrocytoma 3 Chronic subdural hematoma 1 
Cerebral vascular accident 3 Closed head injury 1 
Glioblastoma multiforme 3 Cortical atrophy 2 
Metastatic carcinoma, cortical 2 Healed cortical abscess 1 
Postoperative arteriovascular Healed penetrating head wound 1 
malformation 1 Multiple sclerosis 5 
Preoperative meningosarcoma 1 Posttraumatic concussion 1 
Recent penetrating head injury 2 Psychomotor epilepsy 3 
Chronic-Static Controls 
(N = 16) N 16 
Convulsive disorder due to Cancer of nasopharyr 1 
infectious disease 3 Character disorder 2 
Convulsive disorder (grand mal Facial laceration 1 
due to unknown cause 7 Neurological complaints without 
Posttraumatic convulsive CNS disease 2 
disorder + No clinical disorder four 1 
Psychomotor epilepsy 2 Non-CNS surgery 2 
Paraplegia 2 
Psychoneurosis 2 
Recurrent lumbar disc disorder 1 
Schizophrenic reaction 1 
1 


Superficial occipital osteoma 


9.69 (SD 2.36); 9.31 (SD 2.08); 9.00 (SD 2.83); based upon the number of Halstead variables on 


and 9.06 (SD 3.03). which the subject’s performance ranked within the 
range characteristic of brain damaged individuals. 
Procedure In order to facilitate group comparisons and 


equalize variability on the several measures on each 
variable, the raw scores from 2!l groups were pooled 
and ranked poorest to best performance. These ranks 
were converted to normalized standard scores (T 
scores). Since the groups had been equated by 
matching individuals, any two groups could be com- 


All patients were administered the Wechsler-Belle- 
vue Intelligence Scale, Form I, and seven of the 
measures described by Halstead (1947) as indicators 
of biological intelligence. The seven Halstead indi- 
cators used were those found by Reitan (1955b) to 


be the most sensitive for differentiating between sub- pared by calculating the mean of the T score dif- 
jects with and subjects without evidence of organic ferences between the corresponding individuals in 
brain damage. Additionally, a composite score (Im- the two groups. This mean difference, in turn, was 


pairment Index) was computed for each subject evaluated by Student’s ¢t. Also, in order to present 


TABLE 2 


WECHSLER-BELLEVUE SUMMARY SCORES AND STANDARD DEVIATIONS 
ACCORDING TO ACUTENESS OF LESION 








Relatively Chronic 
Acute Static Static Control 
IQ Mean SD Mean SD Mean SD Mean SD 
Full 80.38 13.99 92.81 17.07 90.38 14.68 108.81 10.16 
Verbal 80.31 18.45 95.12 15.70 88.88 15.13 105.38 10.90 
Performance 84.44 13.88 91.81 18.27 


8.2 93.75 13.05 111.38 11.15 
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Fic. 1. Graphic presentation of mean T score values on Wechsler-Bellevue vari- 


ables for contro] group and three brain damaged groups. 


the intelligence quotients in a familiar manner, mean 
scores and standard deviations for these variables 
were computed from the raw scores 


RESULTS 

Means and standard deviations of the Full, 
Verbal, and Performance scale IQ scores are 
presented in Table 2. Only the Acute group’s 
mean IQ scores were consistently below the 
range of 90-109. 

The general trends of performances of the 
four groups on the Wechsler-Bellevue vari- 
ables may be seen in Figure 1. The Control 
group performed at levels consistently su- 
perior to those of the brain damaged groups. 

The mean IQ score differences between Con- 
trols and brain damaged groups were signifi- 
cant at, or well beyond, the .05 level (see 
Table 3). Among the brain damaged groups, 
mean scores for the Acute lesion group were 
generally inferior to those of the two static 
lesion groups. Mean Verbal IQ of the Rela- 
tively Static group was higher than that of 
the Acute group (p < .05), and mean Per- 
formance IQ of the Chronic-Static group was 
higher than that of the Acute group (p< 
.05). Two of the three brain damaged groups 
(Acute and Chronic-Static) obtained slightly 
higher Performance than Verbal mean IQ 
scores (see Table 3). 


On only 2 of the 11 Wechsler subtests did 
the mean difference scores between Controls 
and Acutes fail to exceed the .01 level of 
significance; and the mean difference scores 
were significant beyond the .05 level on those 
two subtests (Digit Span and Picture Ar- 
rangement). Comparisons between Controls 
and each of the static lesion groups also 
yielded significant differences on several sub- 
tests (see Table 3). 

The general trends of the four groups on 
the Halstead Neuropsychological measures 
may be seen in Figure 2. As with the Wechs- 
ler-Bellevue, the Control group performed 
at levels consistently exceeding those of the 
brain damaged groups. Among the latter 
groups, the two static lesion groups per- 
formed at fairly comparable levels, although 
their mean scores generally exceeded those of 
the Acute group. 

On all but two of the eight Halstead vari- 
ables, mean difference scores were significant 
beyond the .001 level when the Control and 
Acute groups were compared. On the two re- 
maining variables, Speech-Sounds Perception 
and Finger Oscillation, the Control and Acute 
groups were differentiated beyond the .01 and 
.O5 levels, respectively (see Table 4). 

Controls were differentiated from Chronic- 





K. B. Fitzhugh, L. C. Fitzhugh, and R. M. Reitan 


TABLE 3 


t Ratios BASED UPON DIFFERENCES BETWEEN EQUATED 
ParRS ON WECHSLER-BELLEVUE VARIABLES 


Test Variable 
Full IQ 
Verbal IQ 
Performance IQ 
Information 
Comprehension 
Digit Span 
Arithmetic 


Control 
vs. 
Acute 


5.14°*00 
4.20**** 
613°" 
sar" 
3.66*** 
2.20* 


4 75**** 


Control 
vs. 
Relatively 
Static 


3.41*** 
aa* 

2 .738*** 
36 
70 
.06 

1.81 


Control 
vs. 
Chronic 


4.22**** 
ES oe 
4.47¢*** 
4.340009 
2.95*** 
1.39 

1.88 


Acute 
vs. 
Relatively 
Static 


87 
SY 
16 
16 


55 


Chronic 
vs. 
Relatively 
Static 


43 
1.15 
39 
3.04*** 
1.05 
.29 
04 


Similarities 
Vocabulary 

Picture Arrangement 
Picture Completion 
Block Design 

Object Assembly 
Digit Symbol 


i_— 1.73 
4.10**** 92 
2.86** 2.47* 
— 1.17 
a sm 
sor" 352°" 
5 16**** 3.92*** 


*p <.05 
* » < .02 
“> < 01 


44» < 001 


Statics on all Halstead variables except 
Speech-Sounds Perception; and Controls were 
significantly differentiated from Relatively 
Statics on five of the eight Halstead variables. 


Every brain damaged group was differentiated 
from Controls on the composite measure, Im- 
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Fic. 2. Graphic presentation of mean T score 
values on Halstead Neuropsychological measures for 
control group and three brain damaged groups. 
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pairment Index, at levels exceeding the .01 
level. Such differentiation is consistent with 
the findings of Reitan (1959a) on a hetero- 
geneous group of brain damaged patients. 

On the 22 test variables studied the two 
static groups differed significantly from each 
other on only one, the Information subtest of 
the Wechsler-Bellevue. This particular differ- 
ence may be considered suggestive of the ef- 
fects of institutionalization upon the Chronic- 
Static group. In contrast, the Acute group 
performed significantly less well than one or 
both of the static groups on several variables. 
Differentiation occurred at levels exceeding 
the .C5 level on the Wechsler-Bellevue vari- 
ables of Arithmetic, Similarities, Verbal IQ, 
and Performance IQ. The Halstead Indicators 
of Memory and Location variables of the 
Tactual Performance Test, and the Seashore 
Rhythm Test differentiated Acutes from one 
or both of the static groups at levels exceed- 
ing the .05 level of significance. 


DISCUSSION 
As Rosvold (1959) pointed out recently: 
studies with respect to the effect of brain damage on 


general intelligence, though more rigorous than in 
the past, are no more in agreement than were ear- 
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TABLE 4 


i Ratios BASED UPON DIFFERENCES BETWEEN EQUATED PAIRS ON 
HALSTEAD NEUROPSYCHOLOGICAL INDICATORS 


Control 
Control vs 

vs. Relatively 
Indicator Acute Static 
4.90*"** 
6.02**** 
2n°ee* 85 
72%*** 3 04*** 
66**** 1.94 
38*** 1.47 
55* 3 27*** 


3 31%** 


61** 
‘ 51**** 


Category 

TPT Time 

TPT Memory 
TPT Location 
Rhythm 

Speech 

Finger Oscillation 
Impairment Index 


Newer uv 


5o**** 


OS. 
02 
01 

< .001 


lier studies, some of deterioration, 


others not (p. 434). 


which, claimed 


The basis for disagreement may relate in part 
to differences in the types of brain damage 
studied. Many studies in this area have used 
subjects with brain damage resulting from 
head injuries (Aita, Armitage, Reitan, & 
Rabinovitz, 1947; Milner, 1956; Ross, 1958; 
Teuber & Mishkin, 1954; Weinstein & 
Teuber, 1957). (These and other references 
in this section are illustrative rather than 
exhaustive.) In some instances the patients 
have been in the intermediate recovery pe- 
riod when tested (Aita et al., 1947), in 
others (notably Teuber and his co-workers) 
at least several years have elapsed since the 
head trauma occurred, and in temporal lobe 
epilepsy (Milner, 1956) brain damage prob- 
ably occurred in most instances at birth or in 
childhood. In terms of EEG tracings, evi- 
dence has been presented to indicate that 
many patients recover from head injuries, 
gradually approaching more normal results 
(Jasper, Kershman, & Elvidge, 1945). 

The variation in type and severity of brain 
damage associated with developing brain dis- 
ease processes has been well documented 
(Wechsler, 1958). Relatively fewer investiga- 
tors have used such patients for psychologi- 
cal evaluations (Battersby, Krieger, & Bender, 
1955; Halstead, 1947; Morrow & Mark, 
1955; Reitan, 1955a). Again, the diversity 


Acute Chronic 
Control vs. Chronic vs 
vs. Relatively vs. Relatively 
Chronic Static Acute Static 
3.90*** 82 
5.62**** < 57 
2.45* 0* 62 
3.96*** ia ; 86 
+ 
3 
) 
i 


87 


2.59° 
1.46 r 
4.50°*** 


5.17**** 


2 
1 
4 


1.27 


either within or between 
diagnostic categories, is well known to neu- 
rologists, neurological surgeons, and neuro- 
pathologists. It is not beyond the scope of 
reasonable possibility that different diagnostic 
conditions may reflect themselves differently 
in psychological testing. In fact, Reitan 
(1959b) has recently demonstrated that in- 
ferences based on psychological test results 
alone (without reference to anamnestic ma- 
terial or other findings) identify patients with 
head trauma, brain tumors, cerebrovascular 
accidents, and other diagnostic conditions at 
levels far exceeding chance expectancy. The 
dependent variables, or psychological tests, 
have also frequently varied from one study to 
another among different investigators. This 
factor also would contribute a certain amount 
of variance to the conclusions drawn. 
Because of the impossibility of simultane- 
ous manipulation of the many factors that 
are probably relevant, the results of any 
single study in this area must be viewed as 
tentative. The present findings, however, agree 
with certain others in which the same instru- 
ments were used in indicating that the effects 
of brain damage may be measured reliably 
(Fitzhugh, Fitzhugh, & Reitan, in press; 
Kigve, 1959; Kigve & Reitan, 1958; Reitan, 
1955a, 1955b, 1958). Additionally, among 
the brain damaged groups consistent trends 
were observed revealing greater psychological 


of these conditions, 
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impairment in patients who suffered from 
acute organic brain damage or disease than 
in groups with relatively static organic dam- 
age. The results suggest that the nature of the 
brain lesion at the time of psychological test- 
ing is an important variable and should be 
considered in studies of psychological deficits 
in association with brain damage. 


SUMMARY 


One Control group and three brain dam- 
aged groups, each composed of 16 patients, 
were compared on the Wechsler-Bellevue In- 
telligence Scale variables and eight Halstead 
Neuropsychological indicators in order to in- 
vestigate psychological impairment in relation 
to acuteness of organic brain dysfunction. The 
Control group’s performances consistently ex- 
ceeded the performances of the brain dam- 
aged groups. Also, two static lesion groups 
(one institutionalized) rather consistently per- 
formed at levels superior to the levels of the 
Acute lesion group. The results suggested that 
acuteness of the organic brain lesions is an 
important variable to be considered in studies 
of psychological deficits among brain dam- 
aged subjects. 
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There is increasing evidence that everyone 
dreams approximately five times every night 
(Aserinsky & Kleitman, 1955; Dement, 1955; 
Dement & Kleitman, 1957; Dement & Wol- 
pert, 1958). Yet even when there is motiva- 
tion to recall dreams, as in psychotherapy or 
research, some people recall no dreams at all 
(Schonbar, 1959), and none report anywhere 
near the maximum possible. Why are so many 
dreams lost? What characterizes those which 
are remembered? 

The present study is based upon the ob- 


servations of the investigators cited above 


that dreams occur intermittently during the 
whole sleep cycle (except in the first hour), 
that they are associated with lighter phases 


of sleep as indicated by EEG, and that they 
tend to get longer during the night. The 
study grew out of Freud’s theories concern- 
ing the function of dreams and of a theory 
arising from Freud’s. 

According to Freud (1949 1957), a 
major function of the dream is to preserve the 
sleep of the dreamer. In sleep, the ego gives 
up its cathexes in both the external and in- 
ternal worlds; the unconscious or id, how- 
ever, does not sleep, and, because of the re- 
laxation of the censorship of the somnolent 
ego, becomes more able to intrude its desires 
upon the individual. Were these wishes to be 
expressed in undisguised form, they would 
create sufficient anxiety to awaken the sleeper. 
Freud therefore considers the dream to be an 
economical compromise, with the forbidden 
impulses allowed expression in disguised form, 
experienced as objective rather than subjec- 
tive events, thus not demanding full censor- 
ship, but allowing sleep to continue. 


1953, 


If the demand made by the unconscious is too great, 
so that the sleeping ego is not in a position to ward 
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it abandons the 
wish to sleep and returns to waking life . . every 
dream is an attempt to put aside a disturbance of 
sleep This attempt can be more or less com- 
pletely successful; it can also fail—in which case the 
sleeper wakes up, apparently aroused by the dream 
itself (Freud, 1949, pp. 56-57). (Quoted by permis 
sion of Norton) 


it off by the means at its disposal 


Such failures are identified as anxiety-dreams 
(Freud, 1953). 

Related to a part of Freudian theory is 
Gutheil’s view (1951) that the dream serves 
to protect the integrity of the ego. Gutheil 
proposes that dreams are most likely to oc- 
cur just as the individual! is falling asleep and 
just as he is awakening, so that the ego is able 
to make gradual adjustments to the differing 
demands of the two states and is not pressed 
into abrupt, and possibly disintegrating, 
changes in function. Gutheil predicts further 
that dreams from the falling asleep period 
are less likely to be remembered than those 
from just before waking because of the long 
period of unconsciousness which intervenes in 
the former case. 

For the most part, the above views grew 
out of the analysis of retrospectively recalled 
dreams, mostly of patients in psychotherapy 
or of Freud himself. There was no way of 
knowing then that these dreams were merely 
a sample of a much larger and determinable 
number. The present study is concerned with 
testing some propositions based in theory, but 
in terms of selective recall, since this is the 
significant factor in what is available to us 
under nonlaboratory conditions. 

The first two hypotheses to be tested are 
that more dreams are remembered as having 
preceded a waking period than as having 
preceded continued sleep, and that propor- 
tionately more dreams are remembered as oc- 
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curring before normal morning waking than 
in the rest of the sleep cycle. These predic- 
tions arise from Gutheil’s views. Since, how- 
ever, a waking period following a dream pro- 
vides the opportunity for conscious repetition 
or motivated recall of the dream, and since 
the recall of morning dreams is in addition 
favored by their length, recency, assumed 
greater rationality (Gutheil, 1951), and the 
lack of intervening factors in time, support 
of these hypotheses will not necessarily sup- 
port Gutheil’s views; lack of support will 
throw considerable doubt upon them. It is 
true, for example, that relatively few dreams 
are recalled from the falling asleep period 
(Ramsey, 1953), as Gutheil suggests, but it 
is also true that dreaming does not seem to 
occur during this period (Dement & Kleit- 
man, 1957; Dement & Wolpert, 1958), thus 
negating part of Gutheil’s theory. 

Freud (1953), referring primarily to pa- 
tients’ dreams, said that forgetting dreams is 
due to resistance. Since he also equates dream 
processes with symptom formation (1953, pp. 
581-582), and since resistance in therapy is 
closely related to the process of repression in 
general, then certain predictions can be made 
concerning the likely characteristics of dreams 
which do not fall victim to repression. If we 
assume that dreams which awaken the sleeper 
have “failed” in their function because the 
anxiety aroused by the unconscious strivings 
is too great to allow sleep to continue, then it 
seems reasonable to assume that dreams which 
have been retained, finding their way into 
consciousness, have also “failed,” although to 
a lesser degree, to keep the unconscious im- 
pulses at bay, since the material, although 
disguised, has also escaped repression to some 
degree. Since the forbidden strivings have af- 
fect associated with them, remembered dreams 
might be expected to be associated more often 
than not with emotional overtones, and, be- 
cause of the conflict, with unpleasant emo- 
tions more often than with pleasant ones. 
Dreams which awaken the dreamer are pre- 
dicted to be more frequently accompanied by 
feelings of anxiety than dreams which do not. 
Allowing for continuous modifications to es- 
cape the experience of anxiety, it is also pre- 
dicted that dreams followed by continued 
sleep are more likely to be recalled as pleas- 
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ant than dreams recalled as occurring either 
before being awakened or before normal 
morning wakening. Dreams with “neutral” or 
no affect, it is hypothesized, are more charac- 
teristic of people who recall fewer dreams in 
general, illustrating a rather generalized re- 
pressive function in such people. 


PROCEDURE 


4 


The subjects (Ss) were 45 teachers and school 
guidance workers who were students in a graduate 
summer session; each produced at least one dream 
during the experimental period. There were 41 women 
and 4 men, 19 Negroes and 26 whites; their average 
age was 37.11 years, with a 
years. 

Each S submitted a report each day for 28 days 
containing an account of all dreams he had had dur- 
ing the preceding 24 hours, or a checked statement 
that he had had none. Detailed instructions were 
given as to what these reports were to contain. Each 
dream was to be identified as to whether it occurred 
during the falling asleep period (F), just before nor- 
mal morning wakening (M), or whether it awakened 
them during the night (W); dreams not falling into 
any of these categories were left unmarked, and will 
be designated as indeterminate (I). Feelings during 
and after the dream were to be recorded. Reminders 
of these aspects of the report were printed on each 
of the 28 report forms, and oral reminders were also 
given. Dreams were to be written immediately upon 
wakening in the morning. If more than one dream 
occurred during the night, the S was to draw a line 
between the reports and to number the 
Dreams were reported in detail. 

All reports were turned in anonymously. On the 
first day of class, each S filled out a personal data 
sheet on which there was a code number; all! later 
material was identified by code number 


range from 22 to 57 


dreams 


RESULTS 


With 45 Ss having approximately 5 dreams 
a night for 28 nights, total recall would have 


resulted in something like 6,300 dreams. 
Actually, a total of 296 dreams was collected, 
dream recall ranging from 1 to 18 per person, 
with the median at 5 dreams. Insofar as they 
could be inferred from interest expressed by 
Ss, from their prompt turning in of material, 
and from the nature of the dream reports, 


1 The instructions and samples of the report sheets 
have been deposited with the American Documenta- 
tion Institute. Order Document No. 6015 from 
ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress; Washington 25, D. C., 
remitting in advance $1.25 for microfilm 
Make checks payable to 
Photoduplication Service, Library of 


or $1.25 
for photocopies Chief, 


Congress 
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cooperation and motivation were high; the 
productivity of the sample was comparable 
to that of a previous group of Ss (Schonbar, 
1959). 

All of the hypotheses were tested by chi 
square. However, in combining dreams for 
different individuals, it was necessary to in- 
sure that the relative proportions of the tem- 
poral and affective components of the dreams 
were approximately the same for each indi- 
vidual regardless of the number of dreams 
contributed. If the relative compositions are 
approximately the same, the combined results 
represent independent observations. Accord- 
ingly, the sample of dreamers was divided 
into quarters according to number of dreams 
recalled; for each group of persons, the dis- 
tribution of dreams according to the signifi- 
cant variables was determined. The four dis- 
tributions for each variable were then in- 
spected. Although minor differences were 
found, the four distributions did not vary in 
any systematic manner, and, by inspection, 
the variability of the distribution for indi- 
viduals within each group seemed sufficiently 
large to assure considerable overlap.* Never- 
theless, because of the relatively small num- 


ber of dreams represented in the lowest quar- 
ter, and in order to insure greater homoge- 
neity, it was decided to retain the median 
division and, in essence, to test each hypothe- 
sis twice, once for dreams of those above the 
median in recall frequency (Group H, V = 
19), and again for the dreams of those below 


the median (Group L, VN = 19); since 7 Ss 
had recalled 5 dreams, exactly at the median, 
their dreams were omitted from the study, 
leaving a total of 261, 213 for Group H and 
48 for Group L. 

In order to determine the nature of the 
feelings associated with the dreams, the in- 
vestigator listed all feeling terms recorded as 
having occurred during or after the dreams; 
dream content other than stated feeling was 
not used. The 106 terms were then arranged 
alphabetically and submitted to four psy- 


2The author wishes to express appreciation to 
Rosedith Sitgreaves for suggesting the method of 
dealing with this problem statistically. The method 
essentially conforms to the implications of Sutcliffe’s 
(1957) discussion of the decomposition of the chi 
square. 


chologists (including the researcher) with in- 
structions to place each feeling into one of 
three categories: neutral (N), pleasant (P), 
or unpleasant (U); the U category was then 
divided into anxious (A) and nonanxious 
(NA). Agreement by three of the four judges 
was the criterion for accepting a given desig- 
nation. In the few cases where there was no 
consensus as to N, P, and U, the feeling was 
classified as N; if there was no majority re- 
garding A, the feeling was classified as NA. 
There was no instance in which anyone classi- 
fied as A any feeling which was not agreed 
upon as U. There were many instances in 
which the S wrote “no feeling” or did not 
record one; these were classified as N. 

The distributions of the 261 dreams among 
the significant variables—I, W, M; N, P, 
U(A), and U(NA)—may be found in Table 1. 
One-tailed tests of significance were applied. 

Hypothesis 1. For each group, those dreams 
reported as M, W, and both (MW) were 
tested against a theoretical 50-50 split. For 
Group H, x? = 1.70, and p > .10; for Group 
L, x? was 1.34, p > .10. There is no support 
for the notion that more dreams are remem- 
bered as having preceded a waking period 
than as having preceded a nonwaking period. 

Hypothesis 2. To test whether M and MW 
dreams are proportionately more frequently 
remembered than others, it was assumed that 
the period in question constituted about one- 
quarter of the total sleep cycle. Hence, M 
and MW dreams were expected to exceed 
25% of the total. For Group H, x? was 2.32, 
p > .05-.10; for Group L, x? = 4.00, p< 
.01-.02. Thus, morning dreams are propor- 
tionately more frequently recalled than dreams 
from the rest of the sleep cycle by people who 
recall relatively few dreams, but not by those 
whose dream recall is greater. 

Hypothesis 3. Group H recalled 83 N 
dreams, 130 containing feelings; against a 
50-50 probability, x? was 9.62, p< .001. 
Group L, on the other hand, recalled 25 N 
dreams and 23 dreams containing feelings; no 
test of significance was necessary since the di- 
rection, if any, was opposite to that predicted. 
Thus, emotional components are associated 
with recalled dreams significantly more often 
than not by those remembering many dreams, 
but this is not true of infrequent recallers. 
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TABLE 1 


NUMBER OF DREAMS IN EACH TEMPORAI 


For Groups H (N = 


Time Group Neutral 


Indeterminate* 


Awoke Dreamer? 


Awoke Dreamer in Morning 


Morning 


Total H 83 
25 
Combined 108 


* Includes 7 dreams identified as F 


Falling asleep) 
b Includes 4 dreams 


identified as FW 


Hypothesis 4. Of the dreams having feelings 
other than N ascribed to them, the number of 
P and U feelings was tested against a 50—50 
expectancy. For Group H, x* was 37.7, p< 
.0001. For Group L, x? was 2.14, p > .05-.10. 
The recalled dreams of frequent recallers are 
not only characterized by having more feel- 
ings than neutral emotional components, but 
also by more unpleasant than pleasant feel- 
ings. The reported dreams of infrequent re- 
callers, on the other hand, are not only more 
emotionally neutral, but, when feelings are re- 
membered, they are not more likely to be un- 
pleasant than pleasant. 

Hypothesis 5. Dreams which awakened the 
Ss (W, MW) were divided into those with 
anxiety and those without; for Group H, this 
distribution was tested against the distribu- 
tion of anxiety and its absence in all other 
dreams. x* was 65.26, significant beyond 
.0001. This hypothesis was not tested for 
Group L because the frequencies were too 
small, although they fell in the predicted di- 
rection. It may be concluded that, while anx- 
iety is not experienced in a majority of Group 
H’s W and MW dreams, a significantly greater 
proportion of anxious feelings is associated 


Pleasant 


Awoke the dreamer while 


AND FEELING CATEG« 


19) anp L (N = 19 


Feelings 


Unpleasant 


Nonanxious 


10 


5 


31 
8 
39 


he was falling aslee 


with these dreams than with those which did 
not awaken the individual. 

Hypothesis 6. The prediction that dreams 
recalled as having occurred at indeterminate 
times during the night and followed by con- 
tinuous sleep contained more pleasant feel- 
ings than other dreams was tested in the same 
manner for Group H, Group L not having 
high enough frequencies for meaningful test- 
ing. x* was 7.15, significant at the .003 level. 
Again, although most of these dreams were 
not remembered as pleasant, they were re- 
membered as being significantly more pleas- 
ant than all others. 

Hypothesis 7. Group H designated 83 of 
its 213 dreams as being neutral in feelings; 
Group L, 25 of its 48. When the distribution 
of N and other feelings in Group L was com- 
pared with an expected frequency based upon 
the distribution in Group H, x* was 5.28, sig- 
nificant at the .01 level. In addition, this hy- 
pothesis received inadvertent support in the 
testing of Hypothesis 3. People who recall 
fewer dreams remember proportionately more 
of them as neutral in feeling than do people 
whose dream recall is greater. 
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DISCUSSION 


The experimental evaluation of any theory 
is a two-step process. First, it must be de- 
termined whether the events or processes as- 
sumed, implied, or predicted by the theory are 
confirmed by observation. Second, and more 
difficult, is the matter of whether the given 
theory explains the observations more ade- 
quately or more economically than other al- 
ternatives. 

For example, Freud said, “Dreams are the 
GUARDIANS of sleep . . .” (1953, p. 233). 
They occur, then, when sleep is endangered. 
And investigations into the physiological cor- 
relates of dreaming have demonstrated that 
dreaming does occur during periods of lighter 
sleep. A necessary factual prerequisite has 
thus been established. But it does not neces- 
sarily follow that these “new laboratory 
experiments . have corroborated Freud’s 
brilliant guess’ (Robinson, 1959, p. 52). 
Freud would maintain that unconscious wishes 
have brought about the lighter sleep by striv- 
ing for expression, that the dream puts a 
stop to this so that sleep may continue. But 
it may also be, of course, that the dream it- 
self somehow interferes with the depth of 
sleep. Thus, the discovery of the correlation 
of dreaming and lighter phases of sleep is a 
necessary but not sufficient condition for sup- 
port of Freud’s theory. 

The research reported here is similarly con- 
cerned with establishing observationally the 
verification of some assumptions or implica- 
tions of Freud’s views. For example, it was 
found that, in general, dreams were not better 
remembered simply because they precede a 
waking period, and that, for those who recall 
relatively more dreams, the period just be- 
fore normal waking did not produce more 
than its proportionate share of recalled 
dreams. Not only do these findings fail to 
support Gutheil’s statements, but they cast 


considerable doubt upon any notion that 


dream recall is primarily dependent upon 
factors similar to those studied by Ebbing- 
haus and others: recency, opportunity for 
recitation, greater length, possibly greater ra- 
tionality, and lack of opportunity for retro- 
active inhibition. Seemingly, more dynamic se- 
lective factors are operating. 


Similarly, if dreams merely repeat events 
of the day before, or are arbitrary representa- 
tions of digestive processes, or responses to 
fortuitous external stimulation, then it should 
not have been found, as it was for Group H, 
that the dreams were accompanied by emo- 
tion more often than not, or that the emotion 
was more often unpleasant than pleasant. But 
these findings are necessary to a theory which 
postulates that the dream represents conflict 
which is important to the dreamer. 

The finding‘ that dreams which awakened 
the sleeper were proportionately more often 
identified as anxiety dreams than were dreams 
followed by further sleep or normal waking 
directly supports one of Freud’s theoretical 
statements. The finding that dreams followed 
by continued sleep contain more pleasant 
feelings than do other dreams is somewhat 
ambiguous. It would seem that these dreams 
might be the most “successful” in Freud’s 
sense—at least of remembered dreams—not 
disturbing sleep, and possibly most disguised 
in the sense of being remembered as enjoy- 
able; one cannot, however, wholly discount 
the likely possibility that the unpleasant as- 
pects of these dreams became the victim of 
further repression during sleep, but there is 
no way of finding out from these data. It 
should be emphasized that even these dreams 
are not characterized as pleasant; it is rather 
that pleasant feelings are more likely to be as- 
sociated with them than with others. 

This study has replicated, in a nonclinical 
situation and with a nonpatient sample, the 
clinical procedure in which report 
dreams which they remember, thus providing 
material similar to that upon which Freud 
and other psychoanalysts have made their ob- 
servations. The findings of this study, at least 
with people who tend to recall dreams, con- 
firm the validity of the observations upon 
which some aspects of Freud’s dream theory 
were built. 

There was only one prediction concerning 
the relationship between feelings in dreams 
and the greater or lesser tendency to recall 
dreams. This was that recallers of relatively 
few dreams would also remember them as 
being more neutral in feeling than would 
more frequent recallers, and this was con- 
firmed. In addition, dreams recalled by the 


people 
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low recallers occurred disproportionately more 
often from the period preceding morning 
waking and, if feelings were attributed to the 
dreams, they were not more likely to be un- 
pleasant than pleasant. It may be that a 
larger sample of dreams from infrequent re- 
callers might have reversed the latter finding 
but, as it stands, it would seem that people 
who recall few dreams also recall them as be- 
ing fairly bland or less unpleasant than do 
people who recall more frequently. It is pos- 
sible that the less frequent recallers do not 
have so many conflicts so that their dreams 
are, in fact, more bland. But it is at least 
equally possible, and seems more likely, that 
people who repress more of their available ex- 
perience, as in forgetting dreams, reveal this 
repression rather generally by also toning 
down the affect. Previous research (Schonbar, 
1959: Singer & Schonbar, 1959) has found 
that dream recall is positively related to mani- 
fest anxiety, but, since the latter was meas- 
ured by conscious self-report, the dilemma is 
merely emphasized rather than resolved. A 
similar question arises concerning the finding 
(Singer & Schonbar, 1959) that repression 
(MMPI R scale) and dream recall are nega- 
tively correlated. But there is also evidence 
(Schonbar, 1959) that people who recall no 
dreams also tend not to recall even the process 
of dreaming. It would thus seem that these 
people exhibit a pattern of repression or lack 
of awareness of the presence and nature of 
their own dream processes. A pattern is sug- 
gested, wherein people who tend to be aware 
of their own internal experience remember 
more dreams and more of the affect associ- 
ated with them, while less aware individuals 
remember fewer dreams and blander affect. 
In summary, then, the findings of this study 
support the underpinnings of some aspects of 
Freud’s theory of dreams and fail to support 
Gutheil’s contention. From the more difficult 
point of view of theoretical adequacy, it is 
worth noting that, while Freud attributed the 
memory of anxious (and possibly of unpleas- 
ant) dreams and the existence of dreams 
which disturb sleep to a breakdown or fail- 
ure of ego function, other analytic theorists 
(Fromm, 1951; Hadfield, 1954) would give 
credit for these events to a successful break- 


through of self-realizing, insight-producing 
forces. But the same kind of substructure of 
intrapsychic conflict is assumed by them as 
by Freud, and the findings of this research, 
therefore, offer confirmation of their 
views. 


also 


SUMMARY 


Forty-five graduate students in education 
turned in reports on recalled dreams every 
day for 4 weeks. On these reports were in- 
cluded information concerning the time dur- 
ing the sleep cycle when the dream occurred, 
and what kinds of feelings were associated 
with it. The total group was divided into two, 
above and below the median in dream recall. 
One-tailed chi square tests were used to test 
predictions based primarily upon formulations 
drawn from Freud’s theory of dreams. It was 
found for both groups that dreams preceding 
a waking period are not better remembered 
than dreams followed by continued sleep, that 
dreams which awaken the sleeper are propor- 
tionately more often associated with anxiety 
than dreams which do not, and that dreams 
which are followed by continued sleep are re- 
called as proportionately more pleasant than 
dreams followed by any kind of waking. For 
the frequent recallers, it was also found that 
dreams are more often remembered as having 
had emotional components than as having 
been neutral, that the feelings are more often 
unpleasant than pleasant, and that the pe- 
riod just before normal morning waking does 
not produce more than its proportionate share 
of remembered dreams. For the group which 
was low in recall, the recalled dreams did not 
contain more emotional than neutral attri- 
butes, and feelings were not more unpleasant 
than pleasant; more dreams were remembered 
by this group from the period just preceding 
normal waking than would be expected. In 
addition, a direct comparison of the two 
groups revealed, as predicted, that the low 
recall group had significantly more neutral 
dreams than the high recall group. In gen- 
eral, it was concluded that the findings of 
this study support some of the propositions 
in Freud’s theory of dreams. The study is not 
seen as a crucial test of theory. 
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ANXIETY, PREGNANCY, AND CHILDBIRTH 
ABNORMALITIES '* 


ANTHONY DAVIDS, SPENCER DeVAULT, anp MAX TALMADGE 


Brown University and Emma Pendleton Bradley Hospital 


Since the development of a rather simple 
instrument for assessing manifest anxiety 
(Taylor, 1953), there has been an epidemic 
of psychological studies concerned with the 
role of anxiety in a wide range of experi- 
mental situations. Here, we will not attempt 
to survey this vast literature. We wish merely 
to point out that these studies of anxiety have 
been conducted mainly in laboratory and aca- 
demic research settings, and little use has 
been made of the instrument in clinical or 
“real life” situations. The developers of the 
instrument and many of their followers have 
stated (Taylor, 1956) that they are not really 
concerned with measuring anxiety, but are in- 
terested in obtaining a measure of “drive.” 
This concept of drive is viewed within the 
framework of Hullian learning theory. Ac- 
cording to this theory, all habit tendencies 
activated by a given stimulus are considered 
to be multiplied by the total drive then oper- 
ating. Employing the Manifest Anxiety Scale 
(MAS) to provide a measure of drive strength, 
the performances of subjects selected on the 
basis of high or low anxiety scores have been 
compared on such measures as eyelid condi- 
tioning (Hilgard, Jones, & Kaplan, 1951; 
Spence & Farber, 1953; Spence & Taylor, 


1 This study was made possible by a research 
grant, B-2356, from the National Institute of Neuro- 
logical Diseases and Blindness, United States Public 
Health Service, awarded to the Brown University In- 
stitute for Research in the Health Sciences. The pres- 
ent report stems from an ancillary study to the Na- 
tional Collaborative Project, conducted locally at the 
Providence Lying-In Hospital, which is investigating 
perinatal factors in child development. We wish to 
express our appreciation to Glidden Brooks, who is 
Director of the Research Institute at Brown Univer- 
sity, for facilitating this study. Also, we are in- 
debted to the clinic staff of the Providence Lying-In 
Hospital for their cooperation and assistance. 


1951; Taylor, 1951), verbal learning (Lucas, 
1952; Montague, 1953; Taylor & Spence, 
1952), word association (Davids & Eriksen, 
1955), and various other more complex tasks 
(Farber & Spence, 1953; Wesley, 1953; Wes- 
trope, 1953). 

There have been some attempts to assess 
the clinical validity of the MAS (Buss, 
Wiener, Durkee, & Baer, 1955; Gleser & 
Ulett, 1952; Hoyt & Magoon, 1954; Kendall, 
1954), and in general it does seem to be as- 
sociated with clinical evaluations of anxiety. 
Moreover, Eriksen and Davids (1955) re- 
ported finding significant personality differ- 
ences between subjects who scored high or 
low on the MAS, and also differences in 
psychological defense mechanisms. More spe- 
cifically, it was found, in a group of male col- 
lege students, that subjects who were high on 
the MAS were also pessimistic in their out- 
look on life and were relatively low on uti- 
lization of the mechanism of repression ac- 
cording to the evaluation of an experienced 
psychoanalyst. 

It seems, then, that the MAS has demon- 
strated utility as a research instrument and 
has generated considerable interesting re- 
search. However, since most personality theo- 
rists place great emphasis on anxiety as a 
motivating factor in life adjustment, and since 
it is a well established fact that anxiety plays 
a crucial role in the formation of psycho- 
pathology, it seems worthwhile to conduct 
further research on the clinical utility of this 
objective instrument for assessing manifest 
anxiety. 

At present, there appears to be increasing 
research interest in the effects of anxiety and 
stress on the psychological course of preg- 
nancy and the influence that emotional tur- 
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moil during pregnancy may have on the sub- 
sequent adjustment of the offspring. In a 
study of physical and mental handicaps fol- 
lowing disturbed pregnancy, Stott (1957) 


suggested that prenatal influences were to 
blame. In studying a group of mentally de- 
fective children, he found that in a large pro- 
portion of the cases there had been marked 
emotional stress during pregnancy, as a re- 


sult of family conflicts and personal unhappi- 
ness. In a recently reported study of the in- 
fluence of prenatal maternal anxiety on emo- 
tionality in rats, Thompson (1957) tested 
and confirmed the hypothesis that “emotional 
trauma undergone by female rats during preg- 
nancy can affect the emotional characteristics 
of the offspring.” 

The plan of the research program from 
which the present report derives is to use a 
variety of psychological procedures to study 
emotional factors in pregnant women. This 
report, however, is concerned specifically with 
‘findings obtained from the MAS administered 
to a group of women during pregnancy and 
readministered soon after delivery of their 
children. 


METHOD 


The subjects of this investigation were 48 preg- 
nant women who were studied at the Clinic of the 
Providence Lying-In Hospital. They are a repre- 
sentative sample of a larger group of women who 
were studied in the course of a pilot study con- 
ducted by a team of medical and scientific investi- 
gators who were engaged in a collaborative project 
on perinatal factors in child development. The women 
were seen for individual psychological testing dur- 
ing the course of a routine visit to the clinic, which 
in most cases was at approximately the seventh 
month of pregnancy. Of the group of 48 women, 20 
returned for a routine physical checkup at approxi- 
mately 6 weeks following childbirth, while the other 
28 women failed to return for this scheduled hospital 
visit. The 20 patients who were seen twice will be 
labeled Group I, and the 28 women who were seen 
only during pregnancy constitute Group II 

In the course of the large scale investigation, vo- 
luminous data were gathered for each patient. As 
part of the assessment, they were administered a 
comprehensive battery of psychological tests. In- 
cluded in this assessment procedure was the 50-item 
MAS, which is the focus of the presént report. In 
Group I, the MAS was administered both during 
and following pregnancy, while in Group II it was 
administered only during pregfiancy. On the basis of 
the official hospital records, it was possible to clas- 
sify each patient’s delivery room record as “normal” 


or as indicating some “abnormality or complication.” 
In Group I, there were 13 patients in the normal 
category and 7 patients in the abnormal category 
In Group II, the subdivisions were 12 normal de- 
liveries and 16 with abnormalities or complications. 

The patients in both groups were of “normal” in- 
telligence. As measured by the Wechsler-Bellevue In- 
telligence Scale, the mean IQ in Group I was 101 
and the mean IQ in Group II was 95. Moreover, in 
both groups the mean age was 25 years and ranged 
from 17 to 40 years. Thus, although no attempt was 
made to match the patients in the two groups, it 
happened that the groups were of very similar age 
and IQ, and in regard to these two variables it seems 
probable that they are representative of pregnant 
women who are being studied at 
throughout the country 


various clinic 
RESULTS AND DISCUSSION 

Now let us consider the findings from the 
MAS. In Group I, on the first testing, the 
normal subgroup obtained a mean manifest 
anxiety score of 16.5, which is significantly 
lower (¢ = 2.19, p = .05) than the mean of 
23.5 in the abnormal subgroup. Examination 
of the ranges of the manifest anxiety scores 
in the two subgroups further evidences the 
greater anxiety in the abnormal group, with 
their scores ranging from 14 to 37, as com- 
pared with scores ranging from 8 to 26 in the 
normal group. Thus, both the mean scores 
and the spread of the individual scores re- 
veal the abnormal delivery group to have been 
relatively high on manifest anxiety according 
to their own avowal of feelings and symptoms 
during pregnancy. In analyzing the results 
from the second testing of the patients in 
Group I, it is noteworthy that the level of 
manifest anxiety decreased in both subgroups 
following pregnancy, with a mean of 15 in 
the normal subgroup and a mean of 18.3 in 
the abnormal subgroup. Although the- group 
that experienced difficult deliveries continued 
to score higher on manifest anxiety than did 
the group who had normal delivery room ex- 
periences, the nonsignificant difference (¢ = 
.70) was not as pronounced as it was when 
the women were in a state of pregnancy. 

The findings in regard to manifest anxiety 
in Group II were remarkably similar to those 
obtained in Group I. In this second group of 
patients, the mean MAS score in the normal 
subgroup was 16, which is significantly lower 
(t = 2.39, p = .05) than the mean score of 
23.6 in the abnormal subgroup. Again, the 
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range of MAS scores from 4 to 30 in the nor- 
mal subgroup was noticeably lower than the 
range from 12 to 38 in the abnormal sub- 
group. Thus, in both samples studied in this 
research, it was found that women who were 
later to experience complications in delivery 
or were to give birth to children with abnor- 
malities tended to report a relatively high 
amount of disturbing anxiety while they were 
pregnant. 

In considering these findings, it should be 
emphasized that at present we have no infor- 
mation regarding the causes or reasons under- 
lying the higher MAS scores in the abnormal 
subgroup. One possibility is that the obste- 
tricians may have anticipated abnormalities 
or complications, and may have conveyed 
this information to the patients. However, 
this possibility does not seem too likely, as for 
the majority of these patients the psychologi- 
cal assessment was conducted during their 
first visit to the clinic. That is, these women 
did not have private obstetricians who fol- 
lowed their medical progress throughout the 
pregnancy, but were being seen for their first 
medical examination at a tather late stage of 
their pregnancy. Future examination of so- 
ciological, medical, and past history data on 
these clinic patients may provide some under- 
standing of causative factors, and greater un- 
derstanding in this regard may well come 
from comparisons of clinic and private pa- 
tients. One other point that should be made 
at this time, however, is that there was no 
difference between the two subgroups in re- 
gard to the number of patients for whom this 
was the first delivery. The mean number of 
previous pregnancies and previous deliveries 
was practically identical in the normal and 
abnormal subgroups. 

It is also interesting to note that the mean 
MAS scores of about 16, obtained in the nor- 
mal subgroups both during and after preg- 
nancy, are very similar to the mean MAS 
scores obtained previously in relatively large 
samples of female college undergraduates 
(Smith, Powell, & Ross, 1955; Taylor, 1953). 
The present findings suggest, therefore, that, 
as a group, pregnant women who will later 
experience normal childbirth do not differ 
from normal nonpregnant college females in 
the avowal of manifest anxiety, but pregnant 


women who are likely to experience child- 
birth abnormalities later are significantly 
higher on manifest anxiety than are other 
groups of pregnant and nonpregnant women. 

The results of this preliminary study, which 
should be regarded as tentative and in need 
of further independent confirmation, are quite 
encouraging. In addition to demonstrating the 
utility of the MAS in this clinical setting, the 
positive findings obtained with this objective 
instrument suggest that even more fruitful re- 
sults may be obtained through use of projec- 
tive techniques designed to uncover indices of 
emotional factors operating at deeper levels in 
the personality. It is hoped that the intensive 
program of investigation we have embarked 
upon will eventually lead to greater psycho- 
logical understanding of complex relations be- 
tween maternal psychodynamics during preg- 
nancy and the process of child development. 


SUMMARY 


The purpose of this research was to com- 
pare measures of manifest anxiety obtained 
during pregnancy and following childbirth, 
and to relate these anxiety measures to de- 
livery room experiences. In two independent 
samples of clinic patients, women who were 
later to experience complications in the de- 
livery room or were to give birth to chil- 
dren with abnormalities obtained significantly 
higher manifest scores during pregnancy than 
did women who later had “normal” delivery 
room records. The results obtained from re- 
testing one of the samples shortly after child- 
birth showed decreased levels of manifest anx- 
iety both in patients who had undergone nor- 
mal childbirth and those who had experienced 
complications or abnormalities. Manifest anx- 
iety scores were still relatively higher in this 
latter subgroup, but the difference was no 
longer significant. It was concluded that these 
findings demonstrate the clinical utility of 
the Manifest Anxiety Scale, and also suggest 
that utilization of projective methods in fu- 
ture research may lead to greater psychologi- 
cal understanding of the role of emotional 
factors in pregnancy and childbirth. 
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REPEAT STUDY WITH A PROJECTIVE FILM 


FOR CHILDREN 


MARY R. HAWORTH? 
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Rock-A-Bye, Baby (Haworth, 1960; Ha- 
worth & Woltmann, 1959) is a projective 
puppet film which can be shown to groups of 
children. The film story focuses on a little 
boy, Casper, and his jealousy of his baby 
sister. When left to baby-sit, he begs the 
witch to help him get rid of the baby. She 
puts a spell on the milk; mother returns and 
rushes the baby to the hospital. Casper is 
filled with remorse, recalls the witch, and 
finally kills her. Thus the spell is broken, the 
baby’s health is restored, Casper’s guilt is re- 
solved, and his parents reassure him of their 
love by a gift of strawberry ice cream. Wolt- 
mann (1951) gives the complete script of the 
play, as well as the rationale for the use of 
puppets in projective devices for children. 

The film is shown to entire classes, divided 
into groups of 10 to 15 children per showing. 
Responses are first secured halfway through 
the showing when the film is stopped and each 
child in the group is invited to finish the story. 
After the rest of the film is shown, each child 
is asked, individually, a standard set of ques- 
tions in terms of Casper: what he thought of 
his parents and of the witch, how he felt when 
the baby got sick, whether he should be pun- 
ished for what he did, what he should tell his 
mother, and how he felt when the baby got 
well. The child is also asked what part he, 
himself, liked best and which character he 
would like to be. 

The film was originally administered to 244 
children, from nursery school through fifth 
grade, as reported by Haworth (1957). A 
scoring scheme (Haworth & Woltmann, 1959) 


1The author is indebted to the principals and 
teachers who cooperated in the project, and to Mary 
Grummon, Ruth Karslake, and James Mathie who 
assisted in interviewing the children and scoring the 
protocols. 
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was devised based on patterns of deviant re- 
sponses * given to the standard questions and 
in the group discussion during the showing. 
The following indices emerged as represent- 
ing dimensions of personality that appear to 
be tapped by this particular film: Identifica- 
tion, Jealousy (sibling rivalry), Aggression 
toward Parents, Guilt (masturbatory), Anx- 
iety (castration), and Obsessive Trends. 

The film has subsequently been shown to 
a new sample of 257 children (kindergarten, 
first, and second grades) in order to ascertain 
whether similar proportions of children would 
score high on the various indices, and whether 
the developmental progressions which ap- 
peared to be demonstrated in the earlier study 
would be substantiated in the second sample. 
A cross-validation analysis was planned for 
the two indices (Guilt and Jealousy) for which 
criterion groups be selected from the 
samples.* One further aspect of the present 
study is concerned with scoring reliability. 


can 


THE SAMPLES 

Table 1 shows the distribution of by 
grades, in the first sample (A, Pennsylvania) and in 
the second sample (B, Michigan) .4 

Sample A was rather heavily weighted toward the 
upper professional levels since 95 of the 244 children 
were from school areas serving predominantly univer 
sity faculty and professional and 
The remaining 149 children 


children, 


managerial groups 
were drawn from a smal! 
2 Deviant 


SE 


responses art the 
10% of a particular age 


given by less than 
ex grouping 

3A subsequent report will be concerned with a 
validation study in which 15 children who scored 
high on the Obsessive Index, or both the Guilt 
and Anxiety Indices, were n with 15 low 
scoring children and given a of individual 
projective tests 

* Discussion of the nurser 
omitted as many of the re¢ 
to be scored 


1 


on 
1atched 


battery 


school group has been 


ponses were too meager 
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TABLE 1 
DISTRIBUTION OF SAMPLE 
Nursery Kinder 
garter 


School 


40 


suburban community representing all 
levels. In Sample B, half (128) were 
school which serves the entire rang: 


occupational 
attending a 
of occupational 
levels, while 129 children came from a marginal dis- 
trict of predominantly lower class families 

As the original study was particularly 
with the responses of the large first grade 
proximately the graders were 
secured in the second sample for comparative pur- 
The groups will hereafter be referred to by 
number and letter, eg., 1-A indicates the first grade 
of Sample A; K-B, kindergarten of Sample B 


concerned 
group, ap- 
same number of first 


poses 


RESULTS 
Index Scores and Developmental Progressions 


A comparison of high scores and deviant 
identification choices ° made by the two first 
grade samples revealed only one substantial 
difference: significantly more children of 1-A 
made deviant identification choices (x 
3.805; p = .051). The specific item that ac- 
counted for most of this difference was identi- 
fication with the opposite-sex parent, with 
this choice being made more often by 1-A 
than by 1-B children. 

Figure 1 demonstrates the otherwise close 
correspondence between the two first grades, 
and includes the fifth grade curve for com- 
parison of the incidence of high scores on each 
index at different ages. 

Developmental progressions for each dimen- 
sion are shown in Figures 2 and 3. Aggression, 
Guilt, and Anxiety show congruent curves 
(Figure 2) with a fairly large incidence of 
high scores in the early grades and a decided 
drop occurring between second and third 


made of only the five identifi- 


are always d 


5 Tabulations were 
cation choices which eviant, as distin 
guished from another category of choices which are 
deviant at certain ages or for a specific sex. Each of 
the other indices requires a specified number of re- 
sponses for a high score Aggression to Par 
ents. For purposes of the present study, the use of 
even one “aggressive” response is considered a high 
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Fic. 1. Percentages of two first 
and fifth grade (5- 


grades (1-A, 1-B) 
A) scoring high on each index 


grades. Figure 3 shows the three indices which 
maintain fairly constant levels in the later 
years. Jealousy and Identification start high 
and remain stabilized at the second grade 
level, while the Obsessive trends show a con- 
stant and much lower level throughout the 
age range under study. 


Cross-Validation of the Guilt Index 


The Guilt Index was originally derived 
(Haworth, 1957) from patterns of deviant 
responses given by 10 of the 12 children in 
the 1-A group who had been observed to en- 
gage in autoerotic practices (masturbation or 
thumb sucking) either during the film show- 
ing or the inquiry period. Similar response 
patterns were given by only 2 of the 100 
“nonautoerotic” children. The seven items in 


® The Guilt Index did not prove to be applicable 


to the third and fifth grades. Only 2 (out of the 
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Fic. 2. Percentages of high scores on Aggression to 
Parents, Guilt, and Anxiety Indices from kinder- 
garten through fifth grade. (Scores are averaged for 
two first grades.) 


the index relate to: (a) mother not knowing 
what went on, (5) Casper being sent to bed 
as punishment, (c) Casper feeling rejected by 
father, (d) Casper being very ashamed or re- 
solving to make amends, and references to 
(e) the baby stinking, (f) Casper or the baby 
being in the water, (g) the kissing scenes. 
Because of the guilt tinged aspect of most of 
these responses, it would appear that children 
who respond with high scores (i.e., at least 
two of the seven items) may be those who not 
only engage in autoerotic acts but who also 
feel guilty for so doing. If such responses were 
also given by the “autoerotic” children in the 
new sample, considerable validity would be 
demonstrated for this index. 

No statistical test was performed on the 
1-A group since this was an ad hoc approach. 
In the present study, it was predicted that 
more autoerotic (AE) than nonautoerotic 
(non-AE) children would score high on the 
index. 


combined total of 92) children received high scores, 
and neither of these was one of the four observed 
“autoerotic” cases in the two grades. 


Haworth 


Table 2 shows the distribution of high 
scores (two or more items) as contrasted to 
low scores (one guilt item or none at all). 
The predictions were upheld, with signifi- 
cantly more AE than non-AE children re- 
ceiving high guilt scores; this difference was 
especially marked in kindergarten and first 
grade. 

The original criterion (Haworth, 1957) for 
inclusion of a response as an item in the in- 
dex stated that its incidence in the AE group 
(V = 12) must be at least one-third of its 
total incidence for all 112 children of the 1-A 
sample. Actually, for five of the seven items, 
at least one-half of the responses came from 
the small AE group. Table 3 shows the dis- 
tribution of guilt responses in the three grades 
of Sample B to be quite similar to that of the 
original 1-A sample from which the index was 
drawn. it can be seen that, irrespective of 
high scores, the AE children make more use 
of the guilt items than do non-AE children, 
so that the original criterion was still met in 
all but two instances. (These involved Item 
No. 5 which was given only once in K-B and 
in 1-B, and by non-AE children in both 
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cases.) The criterion was exceeded by five 
items in K-B, four items in 
items in 2-B. 


1-B, and three 


Cross-Validation of the Jealousy Index 


In the original study (Haworth, 1957) the 
items of the Jealousy Index were selected on 
an a priori basis from responses of the 1-A 
group which appeared to indicate sibling ri- 
valry. The 11 items of this index include: (a) 
one response of Casper being jealous of at- 
tention given to the baby; (4) a minimum 
of two uses by the subject of slips of tongue, 
evasions or personal references; aggression 
against the baby expressed openly (c) while 
the film was being shown, (d) in the half- 
show discussion, or (e-j) in answer to any of 
six specified inquiry questions. For boys, (&) 
choosing to be the baby is an additional item. 
A high score consists of any three of the above 
responses. 

The 1-A sample was divided into two 
groups on the basis of sibling status, with 
oldest + middle children in one group, and 
youngest + “only” children in the other group. 
Significantly more of the former group scored 
high on the Jealousy Index, and the difference 
was largely due to the boys’ responses in each 
grouping. Within the oldest + middle group- 
ing, significantly more oldest than middle 
children received high scores. 

A similar analysis of the 1-B sample (as 
well as K-B and 2-B) revealed no differences 
between the various groupings: oldest + mid- 
dle vs. youngest + only, oldest + middle boys 
vs. youngest + only boys, or oldest vs. mid- 
dle children. The total incidence of high jeal- 
ousy scores was also less for 1-B (16.9%) 
than for 1-A (23.2%), but this difference was 
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not significant. There were some indications 
that differences in family size, ordinal posi- 
tion, or socioeconomic status might be respon- 
sible for the lack of replication in Sample B. 
Much larger samples would be required to 
secure enough high scoring cases for an analy- 
sis of these multiple variables. 


Reliability 


Three judges scored a group of 24 protocols 
pulled at random from the B sample. Inter- 
scorer reliability was computed for the four 
main indices: Jealousy, Guilt, Anxiety, and 
Obsessive Trends. (No reliability study seems 
necessary for Identification choice since quite 


objective criteria can be applied; Aggression 
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to Parents is used qualitatively and has not 
been set up in terms of number of responses 
necessary for a “high” score.) The number 
of checks given to each item of each index 
was compared for each of the three pairs of 
judges. The Rulon formula, with the Spear- 
man-Brown correction, yielded the following 
reliability coefficients for each index: 


93, 935, 

78, 88, . 
De, Phy 
.94, .87, . 


Jealousy: 
Guilt: 
Anxiety: 
Obsessive: 


average = 
average = 
average = 
average = 
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While the overall reliability of scorers is 
quite satisfactory, the lower agreements on 
the Guilt Index were examined for possible 
causes. It was found that most of the dis- 
crepancies between judges occurred as the re- 
sult of neglecting to check, under the item 
“Father rejects Casper,” those responses in 
which the father is mentioned specifically as 
the punisher. The directions have subse- 
quently been clarified to call attention to this 
objective point. 


DISCUSSION 


A replication of the film test has revealed 
no appreciable differences between the two 
large first grade samples, except in the area 
of deviant identification choices, and in the 
sibling status (but not the incidence) of 
children responding on the Jealousy Index. 
The repeat study has also confirmed the ear- 
lier impression that developmental progres- 
sions occur in some areas while plateaus are 
maintained in other dimensions. The fact that 
similar and congruent results were obtained 
between two samples differing in location and 
socioeconomic composition demonstrates a 
certain amount of construct validity for the 
test. To put it differently, if marked differ- 
ences and discrepancies had been found, then 
very little confidence could be put in this in- 
strument as a method of personality assess- 
ment. 

As was expected on the basis of original 
findings, the younger children express more 
outspoken aggression toward parents than do 
older children, and they also score higher on 
measures of guilt and anxiety. There appears 
to be no reason to abandon the earlier hy- 
pothesis (Haworth, 1957) that this film does 
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pinpoint certain problem areas related to the 
oedipal period. In view of the decided drop in 
incidence of guilt and anxiety between the 
ages of 7 and 8 (by which time the latency 
period is presumed to be well underway), it 
still seems, as originally suggested, that the 
guilt measured by the film test is associated 
with masturbation and other autoerotic acts. 
(The postulated relationship between anxiety 
and castration fears is currently being studied 
via other projective techniques.) It can only 
be speculated whether the slight trend up- 
ward of the obsessive scores between the sec- 
ond and fifth grades may indicate an increas- 
ing incidence at still later ages. Fenichel 
(1945) sees an increase in obsessive reactions 
and compulsive rituals during the latency pe- 
riod as defenses become strengthened against 
the instinctual impulses. The curves in Fig- 
ures 2 and 3 may possibly be a graphic rep- 
resentation of the repression of erotic drives 
and the development of defense mechanisms. 

If identification patterns are laid down dur- 
ing the oedipal period, the incidence of devi- 
ant identifications should remain at fairly 
stable levels throughout the age range studied. 
This was found to be the case. On the basis 
of the film responses it would appear that 
jealous reactions, once established, also do 
not decline in the early latency period. The 
threat to the ego is undoubtedly not as great 
in this area as in those more closely linked 
to the oepidal situation. Consequently there 
would be less need to repress or defend. In 
some instances jealousy toward siblings may 
even be serving as a substitute outlet for un- 
acceptable feelings originally directed toward 
the parents. 

The one significant difference between the 
two first grade samples—namely, deviant 
identification—may possibly be attributable 
to differences in the socioeconomic status of 
the two groups. The item responsible for most 
of this difference was the choice of the op- 
posite-sex parent by more children from the 
higher status group. This finding is consist- 
ent with that of Rabban (1950) who showed 
sex-role identification to be more clearly de- 
fined, and at an earlier age, for lower class 
children. 

With respect to the Guilt Index, as has 
been previously pointed out (Haworth, 1957), 
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it is not to be expected that all autoerotic 
children would feel guilty. Nevertheless, the 
fact that a repeat study still shows large pro- 
portions of them giving a specific cluster of 
responses suggests that dynamic factors are 
being tapped by the index, namely, conflicts 
between instinctual drives and conformity to 
parental standards. The validity of the Guilt 
Index for children from kindergarten through 
second grade has also been demonstrated by 
the consistently significant differences be- 
tween the number of high scores in the auto- 
erotic, as contrasted to the nonautoerotic, 
groups. 

In spite of consistent findings on the Jeal- 
ousy Index with respect to the frequency of 
children receiving high scores, the sex and 
sib-status distribution of the scores was not 
upheld in the second sample. It appears that 
high scores may be measuring attitudes to 
either older or younger siblings. In view of 
the equivocal findings, caution should be exer- 
cised in the interpretation of this index, espe- 
cially if it is the only high score in a protocol. 
In combination with high scores on other in- 
dices, it may provide useful supplementary 
data for diagnostic purposes. 


SUMMARY AND CONCLUSIONS 


The projective puppet film, Rock-A-Bye, 
Baby, was originally shown to 244 children 
from nursery school through fifth grade. The 
film has subsequently been shown to 257 chil- 
dren from kindergarten through second grade. 
The two large first grade samples showed 
close correspondence with respect to incidence 
of deviant scores on all measures except 
Identification. The consistent developmental 
progressions from grade to grade, within and 


between samples, demonstrate construct va- 
lidity for the instrument. 

Two indices could be cross-validated by 
means of criterion groups within each sam- 
ple. The Guilt Index showed the predicted 
significant differences between autoerotic and 
nonautoerotic groups in all three grades of 
the new sample. Differences between sibling 
groupings were not upheld on the Jealousy 
Index. 

Since adequate interscorer reliability has 
been demonstrated for the instrument, and 
generally consistent kinds of data have been 
secured in a replicated study, confidence can 


be placed in this technique as a group screen- 


ing device in the personality assessment of 
early latency children. 
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Standal and van der Veen (1957) have re- 
cently suggested that number of interviews 
constitutes an important variable for study in 
research on psychotherapy. Their argument 
lay especially in the demonstration that of 
several measures of progress in therapy, based 
upon counselor judgments, a measure of 
change in personal integration of the client 
was not only the most important clinically 
and theoretically, but also showed the highest 
linear correlation with log number of inter- 
views. 

If it were substantiated that a very high 
correlation between length of therapy and 
change in personal integration exists, it would 
be an important finding, indeed; for it might 
be possible to employ number of interviews as 
a dependent variable of exceptional reliability 
which nevertheless has critical implications 
for personality change. There can be no doubt 
that the finding of dependent variables which 
have both high reliability and high validity 
and also relevant clinical implications is 
among the most important tasks of psycho- 
therapy research today. 

Therefore, it seems important to confirm 
this earlier finding and to attempt to deter- 
mine the most valid procedure, among pos- 
sible alternatives, for obtaining a measure of 
personal integration derived from counselor 
judgments. To illustrate the alternatives: we 
note that the counselors who made judgments 
of change in personal integration for the 
Standal and van der Veen (1957) study did 
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so at the end of their series of contacts with 
clients. In this procedure the counselors were 
asked to rate the integration of each client at 
termination and at the same time to scan their 
long-term memories for a rating of the initial 
level of integration. A nine-point scale rang- 
ing from “highly disorganized or defensively 
organized” (1) to “optimally integrated” (9) 
was used and change in integration was de- 
fined as the arithmetic difference between 
initial and terminal scores. 

If we consider the counselor’s thoughts in 
making such ratings it seems reasonable to 
suspect a certain bias resulting from sheer 
lapse of time involved. The longer the time 
of acquaintance experienced by the counselor, 
the greater his tendency to underestimate the 
initial level of integration. For suppose a 
counselor were really just guessing about the 
level of integration of his client a long time 
ago, his thought might well go something like 
this: “The client has been with me a very 
long time so he must have been in rather poor 
shape to begin with.” 

An alternative procedure would be to elimi- 
nate this sort of possible bias by obtaining 
judgments of the level of integration actually 
perceived by the counselor at both beginning 
and end points. Since counselor judgments are 
not only the most frequently employed cri- 
terion measures of progress, but also the most 
available, the present study was undertaken 
to compare the two rating procedures when 
applied to data comparable to that of Standal 
and van der Veen (1957). In addition it was 
hoped to tease out some of the differences, if 
ahy, between a measure of therapy length 
based upon number of interviews as against 
one based solely on number of weeks. 
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SUBJECTS AND PROCEDURE 


From a single large research block of cases at the 
Counseling Center, University of Chicago, 87 clients 
had terminated therapy at the time the present study 
was undertaken. (Omitted were 6 cases which had 
started in the block but were still in therapy.) All 
clients had been seen during the period 1956-59 by 
client centered therapists. These therapists included 
both males and females, and their experience levels 
ranged from 1 to 12 There male 
clients and 35 females. Also, 52 of the clients were 
students, 35 were not. The mean age of the clients 
was 28.5 years (SD =7.6), and they had been in 
therapy for a mean number of 29.5 interviews (SD 
= 28.1), and for a mean number of 31.9 weeks (SD 

22.5) 


vears. were 52 


Two measures of length of therapy taken 
the exact number of interviews and the rounded 
number of weeks. Counselors were asked to make 
ratings on their clients immediately after the first 
interview and also immediately after the last inter- 
view. In the large majority of 
dated between 1 
terview 


were 


ratings were 
and 3 days after the relevant in- 
The number of weeks was computed from 
the number of days lapsing between initial and final 
dates of ratings 


cases, 


Two measures of integration movement were used 
The first measure was taken only at the end of 
therapy, thus involving the counselor’s long-term 
memory. The counselor was asked to answer two 
juestions. The first was: “What change has there 
been in the client’s feelings toward himself?” Four 
response alternatives were provided, ranging from 
“more discontented” through “much more con- 
tented.” Scores of 1 through 4, respectively, were 
assigned. The second question “How much 
change in the client as a person has occurred since 
he started counseling?” Four response alternatives 
were provided, ranging from “not changed” through 
“changed a good deal.” Scores of 1 through 4, re- 
spectively, were assigned. The sum of the scores on 
these two questions constituted the first measure of 
integration change. It will be called the posttherapy 
estimate of change in integration (PECI). 

The second measure of integration change was a 
difference score between two ratings, one made after 
the first interview, one made after the final inter 
view. Thus, on each rating occasion, only the coun 
selor’s short-term memory was involved. Ratings 
were made on a 10-point scale, ranging from “most 
extreme maladjustment” through “optimal adjust 
ment (fully functioning, optimal maturity).” A score 
of 1 was assigned to the most maladjusted end, a 
score of 10 to the optimal adjustment end. Using 
this scale, the counselor was asked to indicate his 
estimate of the client’s present psychological adjust- 
ment. The score for his estimate after the initial in- 
terview was subtracted from the score for his esti- 
mate after the final interview to yield the second 
measure of change. This second measure will be 
called the difference measure of change in integration 
(DMCI). 


was 


In addition to the above measures, the counselor’s 
nine-point rating of success of the therapy was taken 
for this research. The scale has been used for many 
years at the Counseling Center, and was employed 
also by Standal and van der Veen (1957). The score 
of 9 means marked success 

The reliability and validity of the measures PECI 
and DMCI are not known independently of the pres- 
ent study. However, it will be shown in the results 
below that both have strong correlations with the 
success rating and with each other. The success rat- 
ing scale has previously been shown to have sub- 
stantial reliability and validity (Cartwright, 1955) 


RESULTS 


The comparability between the samples 
studied by Standal and van der Veen (1957) 
and the present writers is very good. Notably, 
the mean length of therapy was 30.7 inter- 
views (SD = 32.5) in the former study, 29.5 
(SD = 28.1) in the present study. Both sam- 
ples have a slightly greater proportion of 
male than female clients, and of student than 
community clients. For both studies male and 
female therapists were employed. 

The data basic to replicating the major re- 
sults of Standal and van der Veen (1957, p. 
9) are included in Table 1, which shows in- 
tercorrelations of all the measures taken in 
the present study. 

First, both PECI and DMCTI correlate posi- 
tively and significantly (p < .001 and p< 
.01, respectively) with log number of inter- 
views. Thus, the first major conclusion of 
Standal and van der Veen (1957), that 
“Change in level of personal integration 
has a moderate linear relationship with log 
case length” (p. 9), is supported. 


TABLE 1 


INTERCORRELATIONS OF TWO MEASURES OF LENGTH OF 
THERAPY, Two MEASURES OF CHANGE IN PERSONAI 
INTEGRATION, AND A RATING OF SUCCEss 
(N = 87) 


Log Log 
Number Number 
of Inter- 
Weeks PECI DMC 


views 
Log Number 

Interviews 

PECI 

DMCI 


Success 
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Second, the success rating correlates posi- 
tively and significantly (p < .001) with log 
number of interviews. This finding accords 
with that of Standal and van der Veen (1957, 
p. 6), but the relative sizes of the correlations 
for success rating and for change in personal 
integration with log number of interviews 
differ in the two studies. Whereas Standal and 
van der Veen found the Pearson correlation 
for change in personal integration to be .58, 
and that for success rating to be .37. in the 
present study the order is reversed for both 
measures of change in personal integration as 
compared with success rating. Thus, the sec- 
ond major conclusion of Standal and van der 
Veen (1957), that “Change in level of per- 
sonal integration is more highly related to 
case length than change or outcome on other 
important case variables” (p. 9), is not sup- 
ported. This finding also lends no support to 
their fourth major conclusion, that “With re- 
spect to actual amount of therapy, change in 
personal integration may be more important 
than rated success or other case variables’’ 
(p. 9). At this time, one can say only that 
length of therapy is positively related to sev- 
eral measures of outcome or change. 

It should be noted that the above results 
hold for both a measure of change in personal 
integration which relies to some extent on the 
counselor’s long-term memory (PECI) and a 
measure of change which does not rely on 
long-term memory (DMCI). Examination of 
the correlations between log number of weeks 
and the three case variables in Table 1 shows 
that the two measures taken only at post- 
therapy (PECI and the success rating) have 
significant positive correlations, while the dif- 
ference measure which does not rely on long- 
term memory has a nonsignificant correlation. 
Since it makes little sense to partial out num- 
ber of weeks from number of interviews the 
evidence in Table 1 must be taken as it stands 
to suggest that sheer length of acquaintance 
does have some influence on the counselor’s 
ratings when these ratings involve his use of 
long-term memory. 

The question arises whether it is possible to 
show somewhat more conclusively the postu- 
lated effects of long-term memory on the 
counselor ratings of change made at the end 
of therapy. The first thing that may be noted 
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from the reliability data presented by Standal 
and van der Veen (1957, p. 5) is that the 
rate-rerate reliability coefficient for personal 
integration at the beginning of therapy, as 
rated at the end of therapy, was .50 (not sig- 
nificant) ; while the comparable coefficient for 
the termination of therapy was .68 (signifi- 
cant at the .05 level). It is also noteworthy 
that in their discussion of reliability they re- 
ported that 34 months later, certain coun- 
selors could not remember well enough to 
make reratings on certain items. The present 
concern is whether the counselors at the time 
of their first rating could remember enough 
about the beginning of therapy to make valid 
ratings. It was suggested above that with long 
cases, the counselors might have been suffi- 
ciently hazy in their long-term memory to be 
rating essentially on a guessing basis with a 
bias toward underestimating the level of inte- 
gration shown by clients at the beginning of 
therapy. To examine this issue, the original 
data for the 72 clients reported on by Standal 
and van der Veen (1957) in regard to ratings 
of personal integration were re-examined along 
with the ratings on DMCI for the present 
sample. These authors report a Pearson cor- 
relation of .67 between the success rating and 
change on personal integration. Table 1 shows 
the Pearson correlation of success rating and 
DMCI to be .68. 

The two scales are highly comparable. They 
have closely similar wording. The first has 9 
steps, the second has 10. Further, it was found 
that the variances were not significantly dif- 
ferent. Inspection of the distributions and of 
the wording for the bottom point suggested 
that the scales could be considered essentially 
equivalent if the unused bottom step of the 
10-point scale was dropped and the other 
steps renumbered accordingly. 

Table 2 summarizes the comparisons be- 
tween the ratings for the two studies when 
the scale used in the present study is treated 
as a nine-point scale. 

Table 2 indicates that for the two samples, 
the difference between the posttherapy ratings 
is not significant while the difference between 
the pretherapy ratings is highly significant. 
(Even if the latter difference is reduced by 
31, the amount of the posttherapy difference, 
the ¢-value is still very high—3.85.) Thus, for 
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TABLE 2 


COMPARISON OF MEAN RATINGS OF PERSONAI 
INTEGRATION FOR Two STUDIES 


Period 
Study Rated N M SD t 
Standal & 
van der Veen Beginning 72 3.1 1.4 
5.22" 
Present Study Beginning 87 43 1.4 
Standal & 
van der Veen End 72 5.49 1.7 
1.22* 
Present Study End m, a2 wa 
*»<.20 
*n<.001 


comparable samples, there is no difference for 
the posttherapy ratings which were based on 
short-term memory. For the ratings of inte- 
gration at the beginning of therapy however, 
the mean rating of integration is significantly 
lower for the sample studied by Standal and 
van der Veen (1957). This result is in accord 
with the expectation that counselors relying 
on long-term memory would tend to under- 
estimate the degree of personal integration 
shown by their clients at the beginning of 
therapy. 


DISCUSSION 


The absolute size of the correlations ob- 
tained in this study between log number of 
interviews and measures of change in per- 
sonal integration was not very great, even 
though the latter judgments may be influ- 
enced by knowledge of the former. While it 
does not seem likely that number of inter- 
views can be used as a Clinically meaningful 
dependent variable on its own, it does bear 
useful relations to a number of important 
measures of change taken from counselor 
judgments, and these relations appear to be 
quite stable over the two samples studied. It 
is clear from the present findings, however, 
that considerable caution must be exercised 
when employing counselor judgments to ob- 
tain such estimates or estimates of any vari- 
able. In particular it appears important to 
pay careful attention to the conditions under 
which counselor judgments are obtained, es- 
pecially in regard to the time span over which 








they are called upon to exercise their memo- 
ries. 

The question of whether log number of 
weeks or log number of interviews is the better 
measure of length of therapy cannot be given 
a general answer from the present study. So 
far as the evidence does go, it appears that log 
number of interviews shows the higher cor- 
relations with measures of change in personal 
integration and success of therapy when these 
are taken from counselor judgments. How- 
ever, it also seems that a spurious length-of- 
acquaintance factor may be contributing to 
those higher correlations when the measures 
are taken from counselor judgments made 
only at the termination of therapy. All in all, 
the best procedure at the present time would 
seem to be offered by the use of log number 
of interviews in conjunction with judgments 
made both at the beginning and at the termi- 
nation of therapy. 


SUMMARY 


Data for 87 clients seen by client centered 
counselors were examined in order to replicate 
certain analyses made by Standal and van 
der Veen (1957) on a similar sample. It was 
confirmed that counselor rating of movement 
on personal integration bears a linear rela- 
tionship to log number of interviews. In con- 
trast to the earlier results, the present study 
found that the counselor success rating had 
a higher correlation with length of therapy 
than did rated movement on personal inte- 
gration. An alternative measure of length of 
therapy was also employed in the present 
study, namely log number of weeks. Correla- 
tions with movement and success were uni- 
formly smaller for this alternative measure of 
length. 

Memory factors influencing the counselors’ 
judgments were examined by use of two meas- 
ures of change in personal integration, one 
calling upon the counselor to rate change at 
the end of therapy only, the other calling 
upon him to rate the level of integration he 
sees in the client after the first interview and 
after the final interview, change being calcu- 
lated from the difference between the initial 
and final ratings. The results showed that, 
when counselors’ ratings of change are made 








88 D. S. Cartwright, R. J. Robertson, 


only after termination of therapy, they are 
influenced by the sheer length of acquaint- 
ance with the client. It was also hypothesized 
that in the Standal and van der Veen (1957) 
study, counselors who, after the termination 
of therapy, rated the initial level of personal 
integration of the client would have been op- 
erating on such long-term memory as to in- 
volve considerable guesswork coupled with a 
bias to underestimate the client's initial level 
of integration. This hypothesis was tested by 
comparing the mean rating of initial integra- 
tion in the earlier study with the mean rating 
of initial integration in the present study 
when counselors were rating from short-term 
memory. The result supported the hypothesis. 
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It was concluded that the use of log num- 
ber of interviews together with judgments 
made both at the beginning of therapy and 
at the termination appears to be the best 
present procedure for examining the relations 
between length of therapy and case variables 
obtained from counselor judgments. 
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Hannah (1958) modified the Bender Visual 
Motor Gestalt Test (BG) by rotating the de- 
signs through 90 degrees on their rectangular 
cards, presenting the design to the subject as 
before. With this new set of cards patients 
produced fewer rotations of figures, presum- 
ably because the longer axis of the card now 
corresponded to the longer axis of the paper. 
However, his statistically significant differ- 
ence was due to a few of his controls pro- 
ducing multiple rotations; examining his sta- 
tistics it becomes evident that just as many 
patients in one group as in the other rotated 
at least one figure (8 out of 36 in each case). 
The present study is essentially a replica- 
tion of his; however, instead of redesigning 
the cards, a comparable effect was attained 
through the expedient of rotating the paper, 
card and paper being thus oriented length- 
wise left-to-right instead of up-and-down as 
was his. 

Examiners within a large neuropsychiatric 
hospital were asked to rotate the tablet when 
administering the BG. Habits being hard to 
break, not all did so. Those who did not com- 
ply unwittingly collected a “control” group of 
records, which, as it turned out, matched the 
experimental as to diagnosis. As the psycho- 
logical reports crossed the secretary’s desk ro- 
tations were noted, the study continuing over 
a 10-month period. An angular displacement 
of at least 45 degrees in a recognizable figure 
was the criterion for rotation. 

Fifty-six “tablet-turned” records were ob- 
tained, 157 conventional ones. Under the 
modified conditions 12.5% of the records had 
one or more figure rotations vs. 29.3% under 
the conventional, the two proportions differ- 
ing significantly at the .02 level (one-tailed 
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test). A chi square between the distribution 
of diagnoses in the two groups (five major 
diagnostic categories being considered) was 
small—0.650 for 4 degrees of freedom—per- 
mitting the conclusion that the groups were 
well matched according to diagnosis. 

The 29.3% rotations in the standard rec- 
ords were unaccountably higher than the 22.8 
previously determined from approximately 
1,000 records in the files of the same hos- 
pital (Griffith & Taylor, 1960). After a chi 
square test had shown that there had been 
no statistically significant shift in diagnoses, 
all the data collected under standard condi- 
tions were combined for a total number of 
1,152 tests—23.5% with one or more figure 
rotations. This 23.5% differed just at the .05 
level of statistical significance from the 12.5% 
of records with rotations in the unconven- 
tional, tablet-turned group (one-tailed test). 

Hannah’s results would seem to be con- 
firmed. It may be concluded that many ro- 
tations are caused by the patient orienting 
the design to the major axis of the paper in 
the same relation it bears to the major axis 
of the card, even though to do so involves 
actually turning the design in relation to him- 
self. The results fit into the pattern of in- 
vestigations begun by Shapiro (see Williams. 
Lubin, Gieseking, & Rubinstein, 1956) which 
relate the phenomena of rotations of both 
block designs and BG figures to stimulus 
properties of figure and ground. However, it 
should be pointed out that however success- 
ful we may be in pinpointing the stimulus 
variables which influence rotations, rotations 
do not thereby lose their diagnostic signifi- 
cance; as long as different diagnostic groups 
are influenced differently by the stimulus con- 
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ditions as they seem to be (Griffith & Taylor, 
1960) the rotation will still have diagnostic 
significance. 

To sum up, it was confirmed, through a 
replication of a previous study, that many of 
the rotations of the Bender-Gestalt figures 
may be attributed to the accidental circum- 
stance that the long axis of the test card is 
oriented at 90 degrees to the long axis of the 
paper upon which the figure is usually drawn. 
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High correlations have typically been found 
between a group’s self-ratings on an array of 
items and the social desirability of those items. 
Taylor (1959) has challenged the conclusion that 
each S’s only his desire to 
produce a favorable self-picture 

The present study replicates Taylor’s design in 
comparing individual and grouped data, but uses 
normal Ss, a shorter time interval between rat- 
ings, different instructions, and a different rating 
instrument—one whose test-retest reliability for 
self-ratings had been studied.? In addition, it at- 
tempts to manipulate the desirability set in indi- 
viduals by exposing them to a personally relevant 
ideal prior to obtaining their self-ratings; presum- 
ably this exposure would enhance the desirability 
set. 

Eighty incoming freshman male medical stu- 


self-ratings reflect 


dents ranked the definitions of the 15 Murray 
needs given by Edwards (1957) from mist to 
least characteristic of themselves, and in a sepa- 
rate ranking, from most to least characteristic of 
successful physicians. Forty Ss ranked the items 
first for themselves and immediately thereafter 


while the other 40 
followed the reverse sequence (Group P-S) 

In Group S-P, where the rating sequence was 
comparable to Taylor’s, the pattern of results 
similar to his. Rank-order correlation be- 
tween average ranks assigned to the items for 
self and for physician was .89, while the median 
of the individual correlations was only .63, with 


for physician (Group S-P) 


was 


1An extended report of this study may be ob- 
tained without charge from Norman Milgram (602 
South 44 Avenue; Omaha, Nebraska) or for a fee 
from the American Documentation Institute. Order 
Document No. 6412 from ADI Auxiliary Publica 
tions Project, Photoduplication Service, Library of 
Congress; Washington 25, D. C., remitting in ad- 
vance $1.25 for microfilm or $1.25 for photocopies 
Make checks payable to: Chief, Photoduplication 
Service, Library of Congress 

2 The scale was administered to 14 nursing stu- 
dents on two occasions one week apart; median in- 
dividual correlation was .86 


of 40 Ss having correlations below .44, the 
.05 significance level. These results support Tay- 
lor’s contention that for the self-ratings of a sub- 
stantial portion of Ss, factors other than social 
desirability set are operative 

In the group receiving reversed-order instruc- 
tions (Group P-S), however, the median indi- 
vidual correlatién (.85) was significantly higher 
than that in Group S-P (p< .01, median test) 
and close enough to the correlation based on item 
means (.95) to suggest that little but the desir- 
ability set was operating in these Ss; only two 
individual correlations in this group fell below 
44. Apparently making physician ratings first en- 
hanced the desirability set in the subsequent rat- 
ings of self. 

That occupying the second position in the in- 
struction sequence modified the self-ratings in 
Group P-S, and not the physician ratings in 
Group S-P, is indicated by an additional finding: 
self-ratings in Group P-S were more uniform 
than in Group S-P, while there was no difference 
in physician ratings. When each S’s self-rating 
was correlated with the mean self-rating for his 
group, the median rho for Group P-S was .80 
and for Group S-P .62, the median test being 
significant at the .01 level. Physician ratings 
were higher and equally uniform for P-S and 
S-P groups, the median rho’s being .87 
respectively. 


and .85, 


In addition to corroborating Taylor’s findings, 
the present study provides evidence that the de- 
sirability set in self-ratings can be enhanced by 
simply having Ss make desirability ratings first 
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This study was concerned with the effects of 
three varying affective stimuli on discriminative 
performance cf schizophrenic and normal sub- 
jects (Ss). It was hypothesized that the apparent 
ineffectiveness of schizophrenic Ss in discrimina- 
tion tasks is related to certain motivational fac- 
tors within the stimulus situation. 

Twenty male schizophrenic Ss and an equal 
number of normal Ss were used in this study. 
Three series of pictorial stimuli were selected 
corresponding to the dimensions of positive, nega- 
tive, and neutral affective states. Each scene was 
represented by five pictures. Two of the series 
consisted of a social situation involving a female 
figure whose face and hands were clearly in evi- 
dence and a three-fourths rear profile view of a 
young child in the foreground. The third series 
consisted of a geometric design with the intent 
that these pictures were to represent a minimal 
amount of any given affective quality. The series 
categorized as negative affective contains the 
theme of reprimand by the central female figure 
with respect to the child; the positive affective 
series contains the theme of acceptance and de- 
sire for closeness on the part of the woman with 


1 This paper is derived from a doctoral disserta- 
tion submitted in partial fulfillment of the require- 
ments for the degree of PhD, Boston University 
Graduate School, 1955. Grateful acknowledment is 
due L. J. Reyna and J. V. Gilmore for their help 
and guidance. 

An extended report of this study may be obtained 
without charge from Milton Turbiner (Box 326, 
Veterans Administration Hospital; Northport, New 
York) or for a fee from the American Documenta- 
tion Institute. Order Document No. 6411 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress; Washington 25, D. C., 
remitting in advance $1.75 for microfilm or $2.50 
for photocopies. Make checks payable to: Chief, 
Photoduplication Service, Library of Congress 


respect to the child. Size, position, and general 
physical characteristics of the characters were 
held constant in both affective series, except for 
a progressive alteration of the facial expression 
of the central figure and the change in the posi- 
tion of her hands—from those representing a 
closeness to those representing rebuff. 

The instructions required that upon simultane- 
ous presentation of the pairs of stimuli of each 
series the Ss were to indicate whether the moods 
expressed in the central figure in both pictures 
were the same or different. Similar instructions 
were given for the discrimination of the geo- 
metric series. The scores obtained for each S 
consisted of the frequency with which the S re- 
sponded “same” to a pair of different pictures 
and “different” to a pair of identical scenes. 

Coincidental with the writing of the disserta- 
tion from which this brief report is derived, 
Dunn (1954) published a study based upon a 
similar hypothesis and research design. The find- 
ings of this study are generally consistent with 
those reported by Dunn (1954). It was found 
that the performance of the schizophrenic group 
was less effective in contrast to that of the nor- 
mals with respect to negative as well as posi- 
tive affective stimuli discrimination. However, 
their performance was indistinguishable from that 
manifested by the normal group with respect to 
the neutral stimuli. This is a clear indication of 
a capacity common to both normal and schizo- 
phrenic groups, which, under predicted condition, 
was not utilized effectively by the schizophrenic 
group as stimuli conditions changed. 
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